cornclllcct 



1 



Mathematical Population Genetics: 



Lecture Notes 



Cornell University, 



June July, 2006 



Warren J Ewens 



2 



Preface 

These notes should, ideally, be read before the Cornell meeting 
starts. They are intended to give background material in mathemat- 
ical population genetics and also, in part, to form the background 
for some of the material given by other lecturers. At the very least, 
the first 27 pages should be read before the meeting. 

Some standard genetical terms will be used and it is assumed that 
the reader is familiar with the meanings of these. These terms in- 
clude gene, genotype, allele, (gene) locus, haploid, diploid, homozy- 
gote, heterozygote, heterozygosity, monoecious, dioecious, polymorphism, 
linkage, recombination. 

Introduction 

The historical background 

Population genetics is the subject growing out of the amalgamation 
of the Darwinian theory of evolution and the Mendelian hereditary 
system. It is crucial to many current areas of science. 

Darwin had no idea of the heredity mechanism, other than the 
vague concept that "children tend to be like their parents" . It was 
thus very risky of him to put forward his evolutionary ideas in the 
absence of having this knowledge. Fortunately his instincts were so 
good that he made very few errors when he did this. 

Perhaps the central theme in population genetics theory is the 
examination of the change in the genetic make-up of a population as 
time goes on as a result of selection, mutation, and similar factors. 
These notes are based on such an examination. However, they focus 
on the random changes in the genetic make-up of a population, 
due essentially to random sampling effects. The random choice of 
one of two genes to be passed on from parent to child, and the 
random events during life that ensure that two equally fit individuals 
do not necessarily have the same number of offspring, make this 
random, or stochastic, aspect of the theory an important component 
of population genetics theory. 

Clearly selection is a major factor in the Darwinian theory. How- 
ever, in these notes we assume that there are no selective differences 
between the different genotypes in a population - that is, we assume 
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selective neutrality throughout. On the other hand, much of the 
discussion in these notes relates to mutation. 

Although these notes describe stochastic processes in evolution- 
ary genetics, it is appropriate to start with a deterministic result 
that ignores the possibility of random changes in gene frequencies, 
since it is so important. 

The Hardy Weinberg law 

We consider a random-mating diploid population in which there is 
no concept of two separate sexes (that is, a monoecious population) 
which is so large that genotype frequency changes may be treated 
as deterministic, and focus attention on some gene locus "A", at 
which two alleles may occur, namely A\ and A 2 . Suppose that in 
any generation the proportions of the three possible genotypes at 
this locus, namely A\Ai, A\A 2 and A 2 A 2 , are X, 2Y, and Z, re- 
spectively. Since random mating obtains, the frequency of matings 
of the type A X A X x A 1 A 1 is X 2 , that of A 1 A 1 x A X A 2 is AXY, and 
so on. If there is no mutation and no fitness differentials between 
genotypes, elementary Mendelian rules indicate that the outcome 
of an A\A\ x A\A\ mating must be A\A\ and that in an indefi- 
nitely large population, half the A\A\ x A\A 2 matings will produce 
A\A\ offspring, and the other half will produce A\A 2 offspring, with 
similar results for the remaining matings. 

It follows that since A\A\ offspring can be obtained only from 
A\A\ x A\A\ matings (with overall frequency 1 for such matings), 
from A\A\ x A\A 2 matings (with overall frequency | for such mat- 
ings), and from A\A 2 x A\A 2 matings (with frequency \ for such 
matings), and since the frequencies of these matings are X 2 , AXY, 
AY 2 , the frequency X' of A\A\ in the following generation is 

X' = X 2 + \(AXY) + \(AY 2 ) = (X + Yf. (1) 

Similar considerations give the frequencies 2Y' of A\A 2 and Z 1 of 
A 2 A 2 as 

2Y' = \ (AXY) + \ (AY 2 ) + 2XZ + \ (AY Z) 

= 2(X + Y)(Y + Z), (2) 
Z' = \(AY 2 ) + \(AY Z) + Z 2 = (Y + Zf . (3) 
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The frequencies X" , 2Y" and Z" for the next generation are found 
by replacing X', 2Y' and Z', by X", 2Y" and Z" and X, 2Y and Z 
by X', 2Y' and in (l)-(3). Thus, for example, using (1) and (2), 

X" = (X' + Y') 2 
= {X + Yf 
= X', 

and similarly it is found that Y" = Y', Z" = Z'. Thus, the genotype 
frequencies established by the second generation are maintained in 
the third generation and consequently in all subsequent generations. 
Frequencies having this property can be characterized as those sat- 
isfying the relation 

(Y 1 ) 2 = X'Z'. (4) 
Clearly if this relation holds in the first generation, so that 

Y 2 = XZ, (5) 

then not only would there be no change in genotypic frequencies 
between the second and subsequent generations, but also these fre- 
quencies would be the same as those in the first generation. Popu- 
lations for which (5) is true are said to have genotypic frequencies 
in Hardy-Weinberg form. 

Since X + 2Y + Z = 1, only two of the frequencies X, 2Y and 
Z are independent. If, further, (5) holds, only one frequency is 
independent. Examination of the recurrence relations (l)-(3) shows 
that the most convenient quantity for independent consideration is 
the frequency x of the allele A\. 

This is an important result, since it shows that for diploid popula- 
tions such as those discussed above, it is sufficient to focus on allelic 
frequencies for much of the analysis. In doing this we will follow 
the conventions of population genetics and refer to these, somewhat 
illogically, as gene frequencies. 

These conclusions may be summarized in the form of a theorem: 

Theorem (Hardy-Weinberg). Under the assumptions stated, a pop- 
ulation having genotypic frequencies X (of AiAi), 2Y (of AiA 2 ) and 
Z (of A 2 A 2 ) achieves, after one generation of random mating, stable 
genotypic frequencies x 2 , 2x(l — x), (1 — x) 2 where x = X + Y and 
1 — x = Y + Z. If the initial frequencies X, 2Y, Z are already of the 
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form x 2 , 2x(l —x), (1 — x) 2 , then these frequencies are stable for all 
generations. The frequency x of the allele A\ remains unchanged is 
all generations. 

The important consequence of this theorem lies in the stability 
behavior. If no external forces act, there is no intrinsic tendency for 
any variation present in the population, that is, variation caused by 
the existence of the three different genotypes, to disappear. This 
result is of great importance for the Darwinian theory, but we do 
not dwell on it now. 

The Hardy- Weinberg law can be generalized in various ways, in 
particular to the case where more than two alleles are possible (for 
example in the ABO blood group system), but we do not consider 
these generalizations here. 

As stated in the above, the Hardy- Weinberg law assumes that 
the population considered is infinite in size, so that that random, 
or stochastic, changes in gene (more strictly, allelic) frequencies are 
not allowed. However, all population sizes are, of course, finite, and 
thus the stochastic aspect of evolutionary population genetics must 
be investigated. Prom now on, these notes focus entirely on 
stochastic processes in evolutionary genetics. 

The stochastic theory: Two alleles 

The "simple" Wright-Fisher model 
Basic theory 

It is necessary, in order to arrive at a theoretical estimate of the im- 
portance of the stochastic factor, to set up stochastic models which 
reasonably describe the behavior of a finite population. Perhaps 
more than in any other part of population genetics theory, the choice 
of a model is arbitrary, and no-one pretends that Nature necessar- 
ily follows at all closely the models we construct. Although they 
did not use the terminology of Markov chain theory, the methods 
used by Fisher (1922, 1930, 1958) and Wright (1931), in developing 
the model considered in this section, are in fact those of this the- 
ory and its close relative, diffusion theory. Here we present some of 
the conclusions of Fisher and Wright in the terminology of Markov 
chains. 
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We consider, as the simplest possible population of fixed 

size N. Suppose that the individuals in this population are mo- 
noecious, that no selective difference exist between the genotypes 
possible at the gene locus "A" under consideration, and that there 
is no mutation. Since each individual carries two genes at this locus 
(one maternally, the other paternally, derived) there are 2N genes 
in the population in any generation, and it is sufficient to center our 
attention on the number X of A\ genes. Clearly in any generation 
X takes one or other of the values 0, 1, . . . , 2N ' , and we denote the 
value assumed by X in generation t by X(t). 

We must now assume some specific model which describes the 
way in which the genes in generation t + 1 are derived from the genes 
in generation t. Clearly many reasonable models are possible and, 
for different purposes, different models might be preferable. Nat- 
urally, biological reality should be the main criterion in our choice 
of model, but it is inevitable that we consider mathematical conve- 
nience in this choice. The model discussed below, although it was 
not written down explicitly by Fisher and Wright, was clearly known 
to them both, since they both gave several formulas deriving from 
it. 

The model assumes that the genes in generation t + 1 are derived 
by sampling with replacement from the genes of generation t. This 
means that the number X(t + 1) of A\ genes in generation t + 1 is a 
binomial random variable with index 2N and parameter X(t)/2N. 
More explicitly, given that X(t) = i, the probability Pij that X(t + 
1) — j is assumed to be given by 



We refer to the model (6) as the "simple" Wright-Fisher model, 
since it does not incorporate selection, mutation, population subdi- 
vision, two sexes or any other complicating feature. The purpose 
of introducing it is to allow an initial examination of the effects of 
stochastic variation in gene frequencies, without any further com- 
plicating features being involved. More complicated models that 
introduce factors such as selection and mutation, and which allow 
more than two alleles, but which nevertheless share the binomial 
sampling characteristic of (6), will all be referred to generically as 
'Wright-Fisher" models. 




2N-j 



i,j = 0,l,2,...,2N. (6) 
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In the form of (6), it is clear that X(-) is a Markovian random 
variable with transition matrix P = {pij}, so that in principle the 
entire probability behavior of X(-) can be arrived at through knowl- 
edge of P and the initial value X(0) of X. In practice, unfortunately, 
the matrix P does not lend itself readily to simple explicit answers to 
many of the questions we would like to ask, and we shall be forced, 
later, to consider alternative approaches to these questions. 

On the other hand, (6) does enable us to make some comments 
more of less immediately. Perhaps the most important is that what- 
ever the value X(0), eventually X(-) will take either the value or 
2N, and once this happens there will be no further change in the 
value of X(-). These are absorbing states of the Markov chain (6). 
Genetically this corresponds, of course, to the fact that since the 
model (6) does not allow mutation, once the population is purely 
A2A2 or purely A1A1, no variation exists, and no further evolution 
is possible at this locus. It was therefore natural for both Fisher 
and Wright to find the probability of eventual fixation of A 1 rather 
than A 2 and, perhaps more important, to attempt to find how much 
time might be expected to pass before fixation of one or other allele 
occurs. 

It is easy enough to see that the answer to the first question is 
X(0)/2N. This conclusion may be arrived at by a variety of meth- 
ods, the one most appropriate to Markov chain theory being that 
the solution ttj = j/(2N) satisfies the standard Markov chain fixa- 
tion probability difference equations and the appropriate boundary 
conditions. Setting j = X(0) leads to the required solution. A sec- 
ond way of arriving at the value X(0)/2N is to note that X(-)/2N 
is a martingale, that is satisfies the "invariant expectation" formula 

E{X(t + l)/2N I X(t)} = X(t)/2N, (7) 

and then use either martingale theory or informal arguments to 
arrive at the desired value. A third approach, more informal and 
yet from a genetical point of view perhaps more useful, is to observe 
that eventually every gene in the population is descended from one 
unique gene in generation zero. The probability that such a gene is 
Ai is simply the initial fraction of A\ genes, namely X(0)/2N, and 
this must also be the fixation probability of A\. 

It is far more difficult to assess the properties of the (random) 
time until fixation occurs. The most obvious quantity to evaluate 
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is the mean time i{X(0)} taken until X(-) reaches or 2N, start- 
ing from X(0). As it happens, no simple explicit formula for this 
mean time is known, although a simple approximation, given later, 
is available. Fisher and Wright, no doubt noting this difficulty, paid 
comparatively little attention to the mean fixation time, concentrat- 
ing on an approach centering around the leading nonunit eigenvalue 
of P. It follows immediately from (6) that if we put x(t) = X(t)/2N, 



E(x(t + l){l- x (t + l)}\ x(t)) = {1 - (2N)- 1 }x(t){l - x(t)}, (8) 



so that the expected value of the heterozygosity measure 2x(-){l — 
x(-)} decreases by a factor of 1 — (2iV) _1 each generation. It follows 
immediately that 1 — (2N)~ 1 is an eigenvalue of the matrix P, and 
it is easy to show that it is the leading nonunit eigenvalue. We write 
the right and left eigenvectors corresponding to this eigenvalue as 
r = (r , ri, r 2 , . . . , r 2N ), and £' = (£ , £i,£ 2 ,---, ?2n) respectively. It 
follows from (8) that r' is proportional to the vector 



{0, 2N - 1, 2(2iV - 2), 3(2iV - 3), . . . , 2N - 1, 0}. (9) 



Unfortunately, no such simple formula is known for the left eigenvec- 
tor £. If we suppose that £ and r are normalized by the requirements 



Pij (t) = Prob{X(t) = j | X(0) = i} 

= riijil - (2Ny 1 } t + o{l - {2N)' 1 } t fort large.(ll) 



Equations (8) and (11) jointly provide much interesting information. 
It is clear that especially in a large population, the mean heterozy- 
gosity of the population decreases extremely slowly with time as a 
result of the random sampling effects implicit in the model under 
consideration. We conclude that although genetic variation must 
ultimately be lost under the model (6), the loss is usually very slow. 
This slow rate of loss may be thought of as a stochastic analogue 
of the "variation-preserving" property of infinite populations shown 
by the Hardy-Weinberg law. This conclusion can be generalized, 
taking into account complications brought about through variation 
in the population size, through geographical factors, through the 
existence of two sexes, and so on. 




(10) 



then standard spectral theory shows that 
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An ergodic argument 

Suppose that, in an otherwise pure A 2 A 2 population, a single new 
mutant A 1 gene arises. No further mutation occurs, so from this 
point on the model (6) applies. How much time will pass before 
the mutant is lost (probability 1 — (2N)' 1 ) or fixed (probability 
(2A^) -1 )? The mean number of generations t\ for one or other of 
these events may be written in the form 

2JV-1 

where tij is the mean number of generations that the number of A\ 
genes takes the value j before reaching either or 2N. Both Fisher 
and Wright found that 

t 1 , j ^2j~\ j = l,2,...,2iV-l, (13) 

so that, using (12), 

ti«2(log(2JV-l)+ 7 ), (14) 

where 7 is Euler's constant 0.5772 . . .. 

There is an ergodic equivalent to the expressions in (12) and (13) 
which is perhaps of more interest than (12) and (13) themselves, and 
which is indeed the route by which Fisher arrived at these formulae. 
Consider a sequence of independent loci, each initially pure U A 2 A 2 " , 
and at which a unique mutation A\ occurs in generation k in the kth 
member of the sequence. We may then ask how many such loci will 
be segregating for A\ and A 2 after a long time has passed, and at 
how many of these loci will there be exactly j 11 A\' genes. It is clear 
that the mean values of these quantities are t\ and iij, respectively, 
and this gives us some idea, at least insofar as the model (6) is 
realistic, of how much genetic variation we may expect to see in any 
population at a given time. The question of the amount, and the 
nature, of the genetic variation that can be expected in a population 
at any given time is of great interest to geneticists, and will be taken 
up later at much greater length. 

Conditional processes 

Consider now only those cases for which the number of Ai genes 
eventually takes the value 2N. What is the transition matrix of the 
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conditional process when the condition is made that eventually this 
happens? 

We assume that the initial value of X(-) = i. Recalling that we 
write X(t) for the number of A 1 genes in generation t, we get 

p*. = Prob{X(t + 1) = j | X(t) = i and eventuallyX(-) = 2N} 
= Prob{X(t +1) = j and eventuallyX(-) = 2N | X(t) = i} 

4- Prob{eventuallyX(-) = 2N \ X(t) = i} 
= Pijj/i, = 1,2,..., 2N). (15) 

Here Pij is the term in the Wright-Fisher transition matrix (6) 
and we have used the fact that when X(-) = i, the probability that 
eventually X(-) = 2N is i/(2N), as well as standard conditional 
probability arguments and the Markovian nature of X(-). Let P be 
the matrix derived from the Wright-Fisher transition matrix P by 
omitting the first row and first column and let 



V 



/TTi \ 



V n 2N / 



(16) 



Then if P* = {p*^}, Eq. (15) shows that 

p* = V^PV. (17) 

Standard theory shows that the eigenvalues of P* are identical to 
those of P (with one unit eigenvalue omitted) and that if £'(r) is any 
left (right) eigenvector of P, then the corresponding left and right 
eigenvector of P* are £'V and V~ 1 r. Further, if P*( n ) is the matrix 
of conditional n step transition probabilities, 

p*(n) _ ^p*yi _ Y~ 1 P n V 

so that 

PT = (18) 

a conclusion that can be reached directly. If is the conditional 
mean time, measured in generations, that X(-) = j, given initially 
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X(0) = i, then 

oo 

= 

n=0 

oo 

= 07i)E# ( 19 ) 

n=0 

Further theoretical results 

In this section we consider the mean time ij until an absorbing state 
(X(-) = or 2iV) of the Markov chain describing the simple Wright- 
Fisher model, given that initially the number X(0) of A\ genes is i, 
and will also consider the mean number of times Uj that X(-) takes 
the value j before an absorbing state is reached. While in principle 
these expressions can be found from standard theory, in practice 
solution of the equations that arise seems extremely difficult, and 
simple expressions for these mean times have not yet been found. 
On the other hand, it is possible to find a simple approximation for 
ij by the following line of argument. 

We put i/M = x, j/M = x + 5x, and ij = t(x), and suppose t(x) 
is a twice differentiate function of a continuous variable x. Then 
from standard theory, 

t( x ) = J^Prob{a; ^ x + 5x}i(x + 5x) + 1 (20) 
= E{i(x + 5x)} + l (21) 
w i(x) + E(5x){t(x)}' + ±E(5x) 2 {t(x)}" + I. (22) 

In the expression (22) the random variable is Sx, all expectations 
are conditional on x and only the first three terms in an infinite 
Taylor series have been retained. Since from (6) 

E(8x)=0, E(5x) 2 = (2N)~ 1 x(l -x), 

Eq. (22) gives 

x(l - x){t(x)}" w -AN. (23) 

The solution of this equation, subject to the boundary conditions 
t(0) =i(l) = 0, is 



t(p) m -AN{p\ogp+ (1 -p)log(l -p)}, (24) 
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where p = i/2N is the initial frequency of A\. This can be shown 
to be a very accurate approximation to the true, but to this day 
unknown, mean fixation time. 

(The value given in (24), and various other approximations given 
in these notes, are in effect "diffusion approximations", that is ap- 
proximate expressions found by approximating a Markov chain by 
a diffusion process. Any reference to approximations in these notes 
refer to such approximations. Diffusion theory will be discussed in 
these lectures by Dr Griffiths.) 

In the case % — 1, so that p = (2AQ -1 , the value appropriate if 
Ai is a unique new mutation in an otherwise pure A 2 A 2 population, 
Eq. (24) reduces to 

t{(2iV)" 1 } « 2 + 21og2iV generations, (25) 

while when p = \, 

m 2.8N generations. (26) 

The value given in (25) is very close to Wright's and Fisher's ap- 
proximation given in (14). 

Suppose now the condition is made that A\ eventually fixes. The 
possible values for X are 1,2,3,..., 2N and Eq. (15) shows that the 
conditional transition probability p*j is 

'2N\ ( J_y ( 2N-i \ 2N ~ j j 



P 



J J \2N J \ 2N J i 
2N-1\ ( i V 1 f2N-i\ 2N - ] 



J - 1 J V 2N V 2iV 



(27) 



An intuitive explanation for the form of p*- is that, under the con- 
dition that A 1 fixes, at least one Ai gene must be produced in each 
generation. Then p*j is the probability that the remaining 2N — 1 
gene transmissions produce exactly j — 1 A\ genes. An argument 
parallel to that leading to (23) gives 

(1 - x){t*(x)}' + \x{l - x){t*{x)}" = -2N (28) 

for the conditional mean time i*(x) to fixation, given a current fre- 
quency of x. The solution of (28), subject to t*(l) = and the 
requirement 

limf(x) is finite, (29) 
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and assuming initially x = p, is 



4Np-\l-p) log(l -p). 



(30) 



We observe from this that 



i* {(2iV)- 1 } w 4N - 2 generations, 

w 2.8A^ generations, 
t* {1 - (2iV) _1 } ~ 21og2iV generations. 



(31) 
(32) 
(33) 



A result equivalent to (31) is that, given initially 2N -1 A 1 genes, 
the conditional mean number of generations until, loss of A±, given 
that such loss will occur, is approximately 



Eq. (32) is to be expected from Eq. (26), since by symmetry, 



no effect on the mean fixation time. On the other hand, Eq. (31) 
and Eq. (33) provide new information and show that, while when 
the initial frequency of A\ is (2N)~ l it is very unlikely that fixation 
of Ai will occur, in the small fraction of cases when fixation of A 1 
does occur, an extremely long fixation time may be expected. 

As noted earlier, the initial analysis of the model (6) by Fisher 
and Wright paid particular attention to the leading eigenvalue of the 
transition matrix, regarded as a measure of the rate at which one or 
other allele is lost from the population. Although the eigenvalues of 
the transition matrix in (6) are of less use than expressions like (24) 
and (30) for investigating the length of tome that both alleles may be 
expected to remain in the population , they are nevertheless of some 
interest, so we now write down the formulae for these eigenvalues. 

Since the matrix defined by the pij in (6) is the transition ma- 
trix of a Markov chain, it follows that one eigenvalue of the matrix 
is automatically 1. Denoting this eigenvalue by Ao, the remaining 
eigenvalues, first derived by Feller (1951), are 



\ j = (2N)(2N-l)...(2N-j + l)/(2NY, j = 1, 2, . . . , 2N. (35) 



This confirms the values Ai = 1 and A 2 = 1 — (2-/V) -1 found earlier by 
other methods. We derive the eigenvalues in (35) later as particular 
cases of those for the Cannings model. 



AN — 2 generations. 



(34) 



when the initial frequency of A 1 is 



|, the conditioning should have 
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Mutation in the Wright-Fisher model 
One-way mutation 

Suppose now that A 1 mutates to A 2 at rate u but that there is no 
mutation from A 2 to A 1 . It is then reasonable to replace the model 
(6) by 

Pij= ( 2 f) W(1-^) 2JV - J ' (36) 

where ipi = i(l — u)/2N. Here it is certain that A\ will be eventually 
lost from the population, and interest centers on properties of the 
time until this loss occurs, either using eigenvalues or mean time 
properties. An argument parallel to that leading to (23) shows that, 
to a first approximation, the mean time i(x) for the loss of A\, given 
a current frequency x, satisfies 

-4Nux{i(x)}' + x(l - x){t(x)}" = -AN. (37) 

If initially x — p, the solution of this equation, subject to the re- 
quirements t(0) = 0, and 

lim t(x) is finite, 

x— »1 

is 

i 

t(p) = J t{ x iP) generations, (38) 
o 

where 

t(x,p) = ANx-\l-e)- 1 {(l-x) e - 1 -I}, 0<x<p, 

t(x, P ) = 4A^x- 1 (i - ey 1 ^ -x) e - l {\ - (i -pY~ e }, p<x<i, 

(39) 

and 9 = 4Nu. (Formulae for the case = 1 are found from (39) by 
standard limiting processes.) 

It may be shown that with the definition of t(x,p) in (39), i(p) 
may be written as 

oo A AT 
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The function t(x,p) in (39) is more informative than it initially 
appears since, as is shown later, t(x,p)8x provides an excellent ap- 
proximation to the mean number of generations for which the fre- 
quency of A 1 takes a value in (x, x + 5x) before reaching zero. 

There are two interesting special cases of (39). First, when 9 = 2, 



t(x,p) = AN, 0<x<p, 
t(x,p) = 4iVar 1 (l - x){(l -p)' 1 - 1}, p<x<l, 

and from this, 



(41) 



i(p) = = 4 ^, (42) 

a conclusion that can also be found directly from (40). Second, when 
p = 1, Eq. (40) gives immediately 

A slightly more accurate approximation is 

2N 

AN + 9 - l)} -1 generations. (44) 

i=i 

The case 9 = 2 is of some interest. For this value of 9 the expression 
in (44) reduces to 

AN — 2 (45) 

generations. This is identical to the conditional mean loss time, 
given initially 2N — 1 genes of the allele A 1 , as given in (34). The 
reason why the unconditional mean time in the mutation process 
and the conditional mean time in the non-mutation process are es- 
sentially identical for the case 9 = 2 is that the entire properties of 
the two processes are, perhaps surprisingly, essentially identical. 

Two-way mutation 

Suppose next that A 2 also mutates to A\ at rate v. It is now rea- 
sonable to define fa in (36) by 



fa = {i(l - u ) + (2N - i)v}/2N. 



(46) 
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There now exists a stationary distribution <fi' = (0o, 4>i, ■ ■ ■ , 4>2n) for 
the number of Ai genes. No simple explicit form of this distribution 
is known. On the other hand, certain properties of this distribution 
can be extracted immediately from (36) and (46). The stationary 
distribution satisfies the equation <p' = 4>'P, where P is defined 
by (36) and (46), so that if £ is a vector with ith element i (i = 
0, 1, 2, . . . , 2iV) and ji is the mean of the stationary distribution, 

The ith (i = 0, 1, 2, . . . , 2N) component of P£ is 

E^'(7)^ (1 -^ )2JV " 

and from the standard formula for the mean of the binomial distri- 
bution, this is 2Nipi or 

i{\ -u) + (2N-i)v. 

Thus, 

<t>'Pt = - u) + (2N - i)v} ai 

= fi(l -u) +v(2N-n). 

It follows that 

H= (1 _ M ) ;(i + t ;(2A^-/i) 

or 

H = 2Nv/(u + v). (47) 

Similar arguments show that the variance a 2 of the stationary 
distribution is 

a 2 = 4N 2 uv I { [u + v) 2 (4Nu + 4Nv + 1 ) } + small order terms. (48) 

The above values are sufficient to answer the question: "what is the 
probability of two genes drawn together at random are of the same 
allelic type?" If the frequency of A\ is x and terms of order iV -1 arc 
ignored, this probability is x 2 + (1 — x) 2 . The required value is the 
expected value of this with respect to the stationary distribution, 
namely 

E{x 2 + (1 - x) 2 } = 1 - 2E(ar) + 2E(x 2 ). 
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If u = v, ANu = 9, Eq. (47) and Eq. (48) show that E(x) = \ and, 
to a close approximation, E(x 2 ) — \ + 4( - 2g 1 +1 - ) . Thus 

Prob (two genes of same allelic type) w (1 + 0)/(l + 26*). (49) 

This probability can be arrived at in another way, which we now 
consider since it is useful for purposes of generalization. Let the 
required probability be F and note that this is the same in two 
consecutive stationary generations. Two genes drawn at random 
in any generation will have a common parent gene with probability 
(2iV) -1 , or different parent genes with probability 1 — (2A^) _1 , which 
will be of the same allelic type with probability F. The probability 
that neither of the genes drawn is a mutant, or that both are, is 
u 2 + (1 — u) 2 , while the probability that precisely one is a mutant is 
2u(l — u). It follows that 

f = K + (i-«.) 2 H^ + f(i-^)} 

+ 2»(l-»)(l-F)(l-i). 



Thus exactly 

and approximately 



1 + 2u(l -u){2N -2) 
1 + 4u(i -u)(2N- 1)' 



F = (l + 0)/(l + 20), (50) 
in agreement with (49). 



The Cannings (Exchangeable) Model 
No mutation 

An important generalization of the Wright-Fisher model was intro- 
duced by Cannings (1974). As with the Wright-Fisher model, this 
model considers a "population" of genes of fixed size 2N, reproduc- 
ing at time points t — 0, 1, 2, 3, . . . . The stochastic rule determining 
the population structure at time t + 1 is quite general, assuming 
only that any subset of genes at time t has the same distribution of 
"descendant" genes at time t + 1 as any other subset of the same 
size. More precisely, if the ith gene leaves yi descendant genes, it 
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is required only that yi + . . . + y2N = 2iV and that the distribution 
of (t/i, i/j, . . . , y k ) be independent of {i, j, . . . , k}. In particular all 
genes must have the same offspring probability distribution. This 
distribution must have mean 1 and we denote the variance of this 
distribution by a 2 . This interpretation of a 2 is used through- 
out these notes when Cannings models are considered. In 
some Cannings models a gene present at time t can also be present 
at time t + 1, and is then counted as one of its own descendants. An 
example of this is discussed later. 

Our first calculation concerning the Cannings model relates to 
eigenvalues. Let the genes be divided into two allelic classes, A 1 
and A 2 , and let X t be the number of A 1 genes at time t. Then 
Cannings' first theorem is as follows:- 

Theorem (Cannings). (No proof). If 

Pij = Prob{X m =j\X t = 0, i,j = 0,1,2,..., 2N, 

then the eigenvalues of the matrix {pij} are 

A = l, Aj = E(y 1 y 2 . . . yj), j = 1, 2, . . . , 2N. (51) 

The Wright-Fisher model (6) is clearly a particular case of the 
Cannings model, since in the Wright-Fisher model (6) (y±, y 2 , ■ ■ ■ , V2n) 
have a symmetric multinomial distribution. (However, the Cannings 
model is far more general, and thus realistic, than the Wright-Fisher 
model.) This implies that if we write 

(27V)! _ fn\ 

//,!//,!... //.iCi.V //, ... //; )! " VyJ' 

the eigenvalue Xj,j = 1,2, ...,27V for the simple Wright-Fisher 
model is given by 

= (2N)(2N-l)...(2N-j + l)/(2N) j . (52) 

This confirms the values given in (35), found originally by Feller 
(1951), using other methods. 

The theorem shows that, for the Cannings model, the leading 
non-unit eigenvalue is A2 = E(y 1 y 2 ) where y^ is the number of de- 
scendent genes of the ith gene in the population. Now yj = 2N, 
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so that the variance of i s 0- Then by symmetry, 

2N var(^) + 2N(2N - 1) covar(y,, Vj ) = 0. 
This implies that 

covar( yi , % ) = -oT 2 /(27V-l), (53) 

where a 2 = var(j/j). Immediately then, 

A 2 = E(yiy 2 ) 

= Covar(y!, y 2 ) + E(y!)E(y 2 ) 

= l-a 2 /(2N -1). (54) 

To confirm this formula we observe that, in the Wright-Fisher model, 
Hi has a binomial distribution with index 2N and parameter (2N)~ 1 . 
Thus for this model, 

A 2 = 1 - {1 - (2N)~ 1 }/(2N - 1) = 1 - (2A0- 1 , 

agreeing with the 11 j = 2" case in the expression in Eq. (52). 

Other properties of the Cannings model follow easily. For exam- 
ple, it is clear by symmetry that the probability of eventual fixation 
of any allele in such a model must be its initial frequency. Further, 
suppose that there are X(t) A\ genes in the Cannings model at time 
t, and write X(t) = i for convenience. If we relabel genes so that 
the first i genes are Ai, 

Var{X(t + l) \X(t)} = Var( yi + ... + </,) 

= ia 2 + i(i — 1) Covar (7/1,2/2) 

= i(2N -i)a 2 /(2N -1), (55) 

from Eq. (53). If x(t) = X(t)/2N, it follows that 

v&r{x(t + 1) I x(t)} = x{t){l - x(t)}a 2 /(2N - 1). (56) 

Mutation 

Suppose now that the genes in the population are divided into two 
allelic types, A 1 and A 2 , and that if mutation does not exist the 
conditions for Cannings' first theorem hold. Now assume that A\ 
mutates to A 2 at rate u, with reverse mutation at rate v. Write 
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Xi = Hi + Zi, where yi = 1 or depending on whether or not the ith 
gene at time t continues to exist at time t + 1. Thus, yi — in the 
model (36), but we are considering now more general conditions than 
those specified by this equation. The variable Z; L is the number of 
offspring genes from the ith gene at time t. If this gene is of type Ai, 
define zn as the (random) number of its A\ (that is, non-mutated) 
offspring: zn has a distribution which depends on Zi. Similarly if 
the ith gene is of type A 2 let z i2 be the random number of its A\ 
(that is mutant) offspring. Then we have 

Theorem 2 (Cannings). The eigenvalues of the matrix P describing 
the stochastic behavior of the number of A\ genes are Ao = 1 and 

\j = pr ( z i> • • • - z o) | e + ^ - z i2\ zi,..., I , 

(j = 1,2,..., 27V). (57) 

In the Wright-Fisher model defined by (36) and (46), yi = and 
Z\ . . . Zj have a multinomial distribution with index IN and com- 
mon parameter (2JV) -1 . Further, given Zi, zn and Zi 2 have binomial 
distributions with respective parameters 1 — u and v. Thus 

E(zn - z i2 | = (1 - u - v)zi 

and 

Xj = P?{zi, . . . , Zj)(l — u — v)i Zi ■ ■ ■ Zj 

= (1 - u - v) j E( Zl . . . Zj) (58) 
= (1 — u — v) j {2N{2N - 1) . . . (2iV - j + 1)/(2A0*}, 
j = l,2,...,2iV. 

The conclusion of (52) has been used in reaching this formula. The 
leading non-unit eigenvalue Ai is 1 — u — v and is thus independent of 
N. This is extremely close to unity and suggests a very slow rate of 
approach to stationarity in this model. The eigenvalues (58) apply 
also in the one-way mutation model, for which we simply put v — 
in (58). 
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The Moran Model 
No mutation 

In this section we consider a model due to Moran (1958) which, in 
contrast to the Wright-Fisher model and the Cannings model, has 
the advantage of allowing explicit expressions for many quantities of 
evolutionary interest, although, strictly, it applies only for haploid 
populations (where each individual has only one gene, rather than 
two, at the "A" locus under consideration). 

Consider a haploid population in which, at time points t — 1, 2, 
3, . . ., an individual is chosen at random to reproduce. After repro- 
duction has occurred, an individual is chosen to die (possibly the 
reproducing individual but not the new offspring individual). As is 
discussed later, the model can be generalized by allowing mutation. 

We consider first the simplest case where there is no mutation. 
Suppose the population consists of 2N haploid individuals (we use 
this notation to allow direct comparison with the diploid case), each 
of whom is either A\ or A^. Suppose also that, at time t, the number 
of A\ individuals is %. Then at time t + 1 there will be i — 1 A\ 
individuals if an A 2 is chosen to give birth and an A 1 individual is 
chosen to die. The probability of this, under our assumptions, is 

p l>l . 1 = t(2N-t)/(2N) 2 . (59) 

Similar reasoning shows that 

p i>i+1 = i{2N -i)/{2N)\ (60) 

p hl = {e + {2N-tf}/{2Nf. (61) 

The matrix defined by these transition probabilities is a continu- 
ant, so that standard theory of can be applied to it. In standard 
continuant "birth-and-death" notation, 

A, = m = i{2N - i)/(2N) 2 , i = 0,1, 2,..., 27V. (62) 

It follows that the probability 7Tj of fixation of A±, given currently i 
Ai individuals, is 

Tii = i/2N, (63) 
and that, using notation developed above, 

Uj = 2N(2N-i)/(2N-j), j = 1,2,...,*, 

Uj = 2Ni/j, j = i + l,...,2N-l. (64) 
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Thus immediately 



2JV-1 



ti 



2N(2N - i) ]T(2iV - J)' 1 + 2Ni ]T 



(65) 




2N(2N - i)j/{i(2N - j)}, j = 1, 2, ... ,i, 
2N, j = i + l,...,2N-l, 



(66) 



t* = 2N(2N -i)r 1 ^2j(2N -j)' 1 + 2N{2N-i- 1).(67) 



An interesting example of these formulae is the case i = 1, corre- 
sponding to a unique A x mutant in an otherwise purely A 2 popu- 
lation. Here t\j = 2N for all j so that, given that the mutant is 
eventually fixed, the number of A 1 genes takes, on average, each 
of the values 1,2, ... , 2N — 1 a total of 2N times. The conditional 
mean fixation time is given by 



birth-death events. The variance of the conditional absorption time 
can also be written down but we do not do so here. 

The eigenvalues of the matrix defined by (59) - (61) can be found 
by using Cannings' first theorem. Take any collection of j genes and 
note that the probability that one of these is chosen to reproduce 
is j/2N, with the same probability that one is chosen to die. For 
this model a gene can be (and indeed usually is) one of its own 
"descendants". Using the notation of Canings' first theorem, the 
product yit/2 ■ ■ - Uj can take only three values: 

if one of these genes is chosen to die and the gene so chosen is 
not chosen to reproduce, 

2 if one of the genes is chosen to reproduce and none is chosen 
to die, 

1 otherwise. 
Thus A = 1 and 



t\ = 2N(2N - 1) 



(68) 



Aj = E(y ± y 2 ... 

= 2j(2N - j)/(2N) 2 + 1 — j (4N — j — 1)/(2N) 2 

= l-j(j-l)/(2N) 2 , j = 1,2,..., 2N. (69) 
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Various expressions for the corresponding eigenvectors have been 
given. We are particularly interested in the largest non-unit eigen- 
value and its associated eigenvectors. The required eigenvalue is 



and elementary calculations show that the corresponding right eigen- 
vector r and left eigenvector £' are 



r = (0,1(2N -1),2(2N -2),...,i(2N -i),...,l(2N-l),0)' 
C = (-|(27V-1),1,1,1,...,1,-|(27V-1)). 



Thus the asymptotic distribution of the number X t of A\ genes for 
large t, given X t ^ 0, 2N, is uniform over the values {1, 2, 3, ... , (2N— 
1)}. The fact that A 2 is very close to unity for large N agrees with 
the very large mean absorption times (65) for large N and interme- 
diate values of i. 

Mutation 

If mutation from A\ to A 2 is allowed (at rate u), with no reverse 
mutation, A 1 must eventually become lost, and interest centers on 
properties of the time for this to occur. The model is now amended 
to 



Standard continuant matrix theory can now be used to find iij and 
thus ii. We do not present explicit expressions since it will be more 
useful (see below) to proceed via approximations. If mutation from 
A 2 to Ai (at rate v) is also allowed, the model becomes 



p M _i = {i(2N -i){l-v)+w 2 }/{2N) 2 = m 

= {*(2AT-*)(l- M )+t;(2iV-*) 2 }/(2A0 2 = A, (71) 



Pi,i = 1 ~Pi,i-i ~Pi,i+i- 

The typical value <j)j in the stationary distribution for the number 
of Ai genes is found from standard theory to be 



A 2 = 1 - 2/(2iV) 2 , 



(70) 



Pi,i-\ 
Pi,i+i 



Pi,i 



{i(2N -i) +ui 2 }/{2Nf = Hi 
i{2N -u)/{2Nf = Xi 

1 ~Pi,i-i ~Pi,i+i- 



4>j = 00 



(2N)\T{j + A}T{B - j} 
j\(2N - j)\r{A}F{B} 



(72) 
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where is the gamma function, A = 2Nv/(l — u — v), B = 
2N(1 - v)/(l -u-v),C = 2Nu/{\ -u-v), D = 2N/(1 - u - v) 
and 0o = T{B}Y{A + C}/[T{D}Y{C}}. Although these expressions 
are exact they are rather unwieldy, and we consider in a moment a 
simple approximation to <f)j. 

The Markov chain defined by (71), having a stationary distribu- 
tion and a continuant transition matrix, is automatically reversible. 
This is not necessarily true for other genetical models: for exam- 
ple it can be shown that the Wright-Fisher Markov chain defined 
jointly by (36) and (46) is not reversible. What does reversibility 
mean in genetical terms? All the theory we have considered so far 
is prospective, that is, given the current state of a Markov chain, 
probability statements are made about its future behavior. Recent 
developments in population genetics theory often concern the ret- 
rospective behavior: the present state is observed, and questions 
are asked about the evolution leading to this state. For reversible 
processes these two aspects have many properties in common, and 
information about the prospective behavior normally yields almost 
immediately useful information about the retrospective behavior. 
We shall see later how the identity of prospective and retrospective 
probabilities can be used to advantage in discussing various evolu- 
tionary questions. 

The eigenvalues of the matrix defined by (71) can be found by 
applying Cannings' second theorem. Here y% = \ unless the ith gene 
has been chosen to die, in which case yi = 0. Similarly Zi, zn and 
Zi2 are zero unless the ith gene has been chosen to reproduce. It is 
found after some calculation that Ao = 1 and 

A,-l {2N) {2N)2 , J-1,-..,^V. [16) 

These eigenvalues apply also in the case v — 0. The leading non-unit 
eigenvalue is 1 — (u+v)/(2N), and since 2N time units in the process 
we consider may be thought to correspond to one generation in the 
Wright-Fisher model, this agrees closely with the value 1 — u — v 
found in (58) for that model. 

Some approximations 

Several of the exact results found above for the Moran model are 
unwieldy, so we now give simple approximate expressions for them. 
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For the case where there is no mutation, is evident from Eq. (65) 
that 

t(p) w -(2iV) 2 {plogp+ (1 -p)log(l -p)}, (74) 

where p = i/2N. The similarity between this formula and (24) is 
interesting. A factor of 2N may be allowed in comparing the two 
to convert from birth-death events to generations. There remains 
a further factor of 2 to explain, and we show later why this factor 
exists. 

In the case of one-way mutation, approximate values for iij may 
be calculated from (71), and from these we obtain an approximate 
value for L. This is 



v 

U w (2Ny(l-6)- L [ [ x^iil-x) - 1 -l}dx 



1 

+ J x- 1 (l-x) - 1 {l-(l-p) 1 - }dx) 



(75) 



v 

birth-death events, where p = i/(2N), x = j/(2N) and 6 is defined 
for the (diffusion) approximation to this Moran model as 2Nu. In 
the particular case p = (2N)" 1 this is, to a close approximation, 

p 

U^2N(l+ J x-^l-xf-Ux) (76) 



(2JV) 



-i 



birth-death events. When = 1 the form of ti may be found by 
application of L'Hospital's rule. 

For the case of two-way mutation we put x = j/(2N), u = 
a/(2N), v = (3/{2N) in (72)and let j and 2N increase indefi- 
nitely with x, a and j3 fixed. Using the Stirling approximation 
T{y + a}/T{y} ~ y a for large y, moderate a, the stationary proba- 
bility 4>j m (72) becomes, approximately, 

b-w-'mm^ 1 -*^ (77) 

at least for values of x not extremely close to or 1. This approxi- 
mation expression is far simpler than the exact value (72). 
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if-allele Models 

The models considered so far can easily be extended to allow K 
different alleles at the locus in question, where K is an arbitrary 
positive integer. (For the ABO blood group system, for example, 
K = 3.) In this case the population configuration at any time can 
be described by a vector (Xi,X 2 , . . . ,Xk), where Xi is the num- 
ber of genes of allelic type A*. If we assume, as is usual, that 
Xi + X 2 + . . . + X K = 2N, only K — 1 elements in the above vector 
are independent. It is however convenient, for reasons of symmetry, 
to retain all elements in the vector. The most interesting cases of 
these models arise when there is no mutation and the K allele gen- 
eralization of the Wright-Fisher, the Cannings or the Moran model 
determines the evolution of the population. In this case any allele 
A,i can be treated on its own, all other alleles being classed simply 
as non-Aj, and much of the above theory can be applied. (On the 
other hand, one problem for which the above theory is inadequate 
is to find the mean time until loss of the first allele lost, the mean 
time until loss of the second allele lost, and so on. This is a more 
complex problem that we do not discuss.) 

We consider in detail only the if-allele generalization of the model 
Wright-Fisher (6), namely 

Pr{Yj genes of allele Ai at time t + 1 | Xi genes of allele 
i at time t, i — 1,2, ... , K} 



;il>rw 2 ...^k (78) 

Y 1 \Y 2 \...Y K r 1 

where ^ = Xi/(2N). In this case the model (78) is in effect a 
Cannings model and a straightforward generalization of the theory 
for the Cannings model given above can be applied. 

The eigenvalues of the matrix defined by (78) are precisely the 
values in Eq. (52), where now Xj has multiplicity (K+j — 2)\/{(K — 
2)!/j!}, (j = 2, 3, . . . , 2N). The eigenvalue Ao = 1 has total mul- 
tiplicity K. These eigenvalues have the interesting interpretation 
(Littler, (1975)) that 

Pr{at least j allelic types remain present at time t} ~ const A*. 

(79) 

When mutation exists between all alleles there will exist a multi- 
dimensional stationary distribution of allelic numbers. The means, 
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variances and covariances in this distribution can be found by pro- 
cedures analogous to those leading to (47) and (48). We consider 
in detail only the case where mutation is symmetric: here the prob- 
ability that any gene mutates is assumed to be u, and given that 
a gene of allelic type Ai has mutated, the probability that the new 
mutant is of type Aj is (K — l) -1 , (j ^ i). By symmetry, the mean 
number of genes of allelic type Ai alleles in the stationary distribu- 
tion must be 2N/K. However, it sometimes occurs that this is not 
a likely value for the actual number of genes of any allelic type to 
arise, and we see this best by finding the probability F that two 
genes taken at random from the population are of the same allelic 
type. Generalizing the argument that led to (50) we find, ignoring 
terms of order u 2 , that 

F = ((2N)- 1 +{l-(2N)- 1 }F)(l-2u) + (l-(2Ny 1 )(l-F)(2u/(K-l)). 
If we write 9 = ANu, this gives 

F ~ (K — 1 + 0)/ {K — 1 + K9). (80) 
This expression agrees with that in (50) for K = 2. For large K, 

F^(l + 6)-\ (81) 

an expression we return to later. 

These formulas demonstrates an important theme. In both for- 
mulas, if 9 is small, then F « 1. This implies that it is very likely 
that one or other allele appears with high frequency with the re- 
maining alleles having negligible frequency, despite the fact that all 
alleles are selectively equivalent. The imbalance arises because of 
stochastic effects, and is quite different from that predicted by con- 
sidering the mean allele frequencies only. 

The eigenvalues of the matrix defined by the symmetric mutation 
model are the values (52) if A« is multiplied by {1 — uK{K — l)" 1 } 1 . 
The multiplicity of \ is (i + K - 2)\/{i\(K - 1)!}. 

In view of the comments concerning the Cannings model made 
above it is plausible that Eqs. (80) and (81) hold with 9 defined 
by 9 = ANu/ a 2 . There is also a ii'-allele Moran model which allows 
various exact formulae, but we do not consider this here. 
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Infinitely Many Alleles Models 
Introduction 

In this section we consider the "infinitely many alleles" versions 
of the Wright-Fisher, the Cannings and the Moran models. The 
discussion of the Wright-Fisher model is more extensive than that 
for the remaining models. This is not because it is more important 
than the other two, but arises for two reasons. The first is that 
calculations for this model are comparatively straightforward, and 
the second is that results for this model can be taken over almost 
directly for the Cannings model, with an appropriate change in the 
definition of the parameter 9 arising in all the formulas found. This 
definition is given below in the section on the Cannings infinitely 
many alleles model. 

Results for the Wright-Fisher and the Cannings infinitely many 
alleles models are usually approximations. By contrast, the infinitely 
many alleles Moran model allows many exact calculations. 

Mutation is intrinsic to all infinitely many alleles models, but the 
nature of the new mutants is different from anything assumed so far, 
the key difference being that all mutant genes are assumed to be of a 
new allelic type, not currently or previously seen in the population. 
This has several important implications that are discussed in detail 
below. 

The infinitely many alleles model is central for the theory of 
molecular population genetics, for reasons discussed later. 

The Wright Fisher Infinitely Many Alleles Model 

The Wright-Fisher infinitely many alleles model follows the generic 
binomial sampling characteristic of all Wright-Fisher models. The 
nature of the mutation mechanism, discussed above, implies that 
if the mutation rate (always to new allelic types) is u, and if in 
generation t there are Xi genes of allelic type Ai {% = 1,2,3, . . .), 
then the probability that in generation t + 1 there will be Y { genes of 
allelic type A iy together with Y new mutant genes, all of different 
novel allelic types, is 

Prob{F , Yi, Y 2 , . . . | X u X 2 , . . .} = ^ IItf* (82) 
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where ttq = u and 7Tj = X{(1 — u)/ (2N), i = 1, 2, 3, . . . . 

This model differs fundamentally from previous mutation models 
(which allow reverse mutation) in that since each allele will sooner 
or later be lost from the population, there can exist no nontrivial 
stationary distribution for the frequency of any allele. Nevertheless 
we are interested in stationary behavior, and it is thus important to 
consider what concepts of stationarity exist for this model. To do 
this we consider delabeled configurations of the form {a,b, c, . . .}, 
where such a configuration implies that there exist a genes of one 
allelic type, b genes of another allelic type, and so on. The spe- 
cific allelic types involved are not of interest. The possible config- 
urations can be written down as {2N}, {2iV - 1, 1}, {2N - 2,2}, 
{2N — 2, 1, 1}, ... , {1, 1, 1, ... 1} in dictionary order: The number of 
such configurations is p(2N), the number of partitions of 2N into 
positive integers. For small values of N values of p(2N) are given 
by Abramowitz and Stegun (1965, Table 24.5), who also provide 
asymptotic values for large N. It is clear that (82) implies certain 
transition probabilities from one configuration to another. Although 
these probabilities are extremely complex and the Markov chain of 
configurations has an extremely large number of states, nevertheless 
standard theory shows that there exists a stationary distribution of 
configurations, some of the characteristics of which we now explore. 

We consider first the probability that two genes drawn at random 
are of the same allelic type. For this to occur neither gene can be a 
mutant and, further, both must be descended from the same parent 
gene (probability (2N)" 1 ) or different parent genes which were of 
the same allelic type. Writing for the desired probability in 
generation t, we get 

F 2 (m) = (1 - uf^N)' 1 + {1 - (2N)' 1 }F 2 (t) ). (83) 

At equilibrium, F 2 (m) = F 2 (t) = F 2 and thus 

F 2 = {1 - 2N + 2N(1 - u)- 2 }- 1 « (1 + 0)~\ (84) 

where, as as is standard for Wright-Fisher models, = ANu. This 
is identical to the limiting (K — > oo) value in (81). In view of the 
fact that there is no concept of the stationary distribution for the 
frequency of any allele in the infinitely many alleles case, this fact 
is perhaps surprising. 
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Consider next the probability F 3 t+1 ^ that three genes drawn at 
random in generation t + 1 are of the same allelic type. These 
three genes will all be descendants of the same gene in generation t, 
(probability (2iV)~ 2 ), of two genes (probability 3(2iV - l)((2iV)~ 2 )) 
or of three different genes (probability (2JV - 1)(2N - 2)((2iV)~ 2 )). 
Further, none of the genes can be a mutant, and it follows that 

F 3 (m) = (1 - u) 3 {2N)- 2 (1 + 3(2iV - 1)F 2 W + (2n - 1)(2N - 2)F 3 W ) . 

(85) 

At equilibrium F^ t+1 ^ = = F 3 , and rearrangement in (85) yields 
F 3 « 2(2 + 9)- 1 F 2 « 2!/[(l + 9)(2 + 9)]. (86) 
Continuing in this way we find 

Fj t+1) = (l-u)%2N -l)(2N -2)---(2N -i + l^Nf^F^ 
+ terms in i^, . . . , F 2 (t) ] (87) 

and that for small values of i, 

Fi~{i — 1)!/[(1 + 0)(2 + 0) • • • - 1 + 0)]. (88) 

We can also interpret Fj as the probability that a sample of i genes 
contains only one allelic type, or, in other words, that the sample 
configuration is {i}. This conclusion may be used to find the proba- 
bility of the sample configuration {i — 1, 1}. The probability that in 
a sample of i genes, the first % — \ genes are of one allelic type while 
the last gene is of a new allele type is — F { . The probability we 
require is, for i > 3, just i times this, or 

Prob{i-l, 1} = i{F i - 1 —F i } w i(i-2)\9/[(l+9)(2+9) ■ ■ ■ (i-1+9)}. 

(89) 

For i = 2 the required probability is 

Prob{l,l}« 0/(1 + 0). (90) 

The probabilities of other configurations can built up in a similar 
way. We illustrate this by considering the probability F^ 1 ^ that, of 
four genes drawn at random in generation t + 1, two are of one allelic 
type and two of another. Clearly none of the genes can be a mutant, 
and furthermore they will be descended from four different parent 
genes of configuration {2, 2}, from three different parent genes of 
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configuration {2, 1}, the singleton being transmitted twice, or from 
two different parent genes, both transmitted twice. Considering the 
probabilities of the various events, we find 

F 2 ( * 2 +1) = (1 - w) 4 (2iV)~ 3 ((2iV - l)(2N - 2)(2N - 3)F$ 

+ 2(2N - 1)(2N - 2)F 2 ( *J + 3(2N - 1)^) . (91) 

Retaining only higher-order terms and letting t — > oo, we obtain 

F 2 , 2 ~ (3 + 9)- 1 F 2:1 = 30/((l + 6){2 + 0)(3 + 9)). (92) 

Continuing in this way we find an approximating partition proba- 
bility formula for a sample of n of genes, where is is assumed that 
n « N . This formula can be presented in various ways. Perhaps 
the most useful formula arises if we define A = (A ± , A 2 , . . . , A n ) as 
the vector of the (random) numbers of allelic types each of which is 
represented by exactly j genes in the sample. With this definition, 

Prob(A = a) = — — — — . (93) 

v ' l«i2 a 2 ••• n a «ai!a 2 ! ••• a n \S n {6) v ; 

In this expression, a = (ai, a 2 , . . . , a n ) and S n {6) is defined by 

Sjp) = 0(0 + l)(0 + 2) ••• (0 + n-l) 

. The expression (93) was derived by Ewens (1972) and Karlin and 
McGregor (1972). It is necessary that J^jAj = YH a j = n -> an d 
it is convenient to denote J^Aj, the (random) number of different 
allelic types seen in the sample, by K, and ^) . aj, the corresponding 
observed number in a given sample, by k. 

By suitable summation in (93) the probability distribution of the 
random variable K may be found as 

Vrob(K = h) = \S*\e k lS n {9), (94) 

where \S%\ is the coefficient of 9 k in S n {9). Thus |<S^| is the absolute 
value of a Stirling number of the first kind. From (94), the mean of 
K is 

E ( A ')^ + ^T + ^2 + "' + ^T' < 95 > 

the variance of K is 
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and the probability that K = 1 is 

(n-l)\ 



(0 + l)(0 + 2)...(0 + ra-l)" 



(97) 



A formula equivalent to (93) is the following. Suppose that in 
the sample we observe k different allelic types. We label these in 
some arbitrary order as types 1,2, ... ,k. Then the probability that 
K = k and also that with the types labelling in the manner chosen, 
there are rt\, n-i, . . ., n& genes respectively observed in the sample 
of these various types, is 

nW k . . 

(98) 



k\nin 2 ■ ■■n k S n {0) ' 

We now turn to eigenvalue calculations. Equation (83) can be 
rewritten in the form 

F (t+i) _ F (oo) = (1 _ m)2{1 _ ( 2AT)- 1 }{F 2 W - F 2 (oo) }, (99) 

and this implies that (1 — u) 2 {l — (2iV) -1 } is an eigenvalue of 
the Markov chain configuration process discussed above. A sim- 
ilar argument using (85) shows that a second eigenvalue is (1 — 
uf{\ - (2A^)~ 1 }{1 - 2(2A0~ 1 }. Equations (87) and (91) suggest 
that (1 - u) A {l - (2N)~ 1 }{1 - 2(2JV)" 1 }{1 - 3(2A^)~ 1 } is an eigen- 
value of multiplicity 2. It is found more generally that 

Aj = (1 - uf{l - (2iV)- 1 }{l - 2{2N)~ 1 } ...{l-(i- l)(2N)- 1 } 

(100) 

is an eigenvalue of the configuration process matrix and that its 
multiplicity is p(i) — p(i — 1), where p(i) is the partition number 
given above. This provides a complete listing of all the eigenvalues. 

We consider next the mean number of alleles existing in the pop- 
ulation at any time. Any specific allele A m will be introduced into 
the population with frequency (27V) -1 , and after a random num- 
ber of generations will leave it, never to return. The frequency of 
A m is a Markovian random variable with transition matrix given 
in (36), with ipi defined immediately below (36). There will exist 
a mean time E(T), measured in generations, that A m remains in 
the population. The mean number of new alleles to be formed each 
generation is 2Nu, and the mean number to be lost each generation 
through mutation and random drift is E(K)/E(T), where E(fT) is 
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the mean number of alleles existing in each generation. It follows, 
by balancing the number of alleles gained each generation with the 
number lost, that at stationarity, 

E(K) = 2NuE(T). (101) 

An approximation to E(T) is found by putting p = (2N)~ 1 in (39). 
This gives, to a close approximation, 

i 

E(K)^9 + J 9x~ 1 (l -xf^dx. (102) 

(27V)- 1 

A more detailed approximation is possible. If E(K(xi,x 2 )) is the 
mean number of alleles present in the population with frequency in 
any interval (x±,x 2 ) ((2iV) -1 < x\ < x 2 < 1), then 

X2 

E(K(xi, x 2 )) ~ J Ox' 1 {I -xf^dx. (103) 

This equation can be used to confirm (95). An allele whose popula- 
tion frequency is x is observed in a sample of size n with probability 
1 — (1 — x) n . From this and (103) it follows that the mean number 
of different alleles observed in a sample of size n is approximately 

i 



and the value of this expression is equal to that given in (95). The 
function 

(j>{x) = 9x-\l -xf- 1 (105) 

is called the "frequency spectrum" of the process considered, and 
will be discussed in detail by Dr Griffiths. Ignoring small-order 
terms, it has the (equivalent) interpretations that the mean number 
of alleles in the population whose frequency is in (x, x + 5x), and also 
the probability that there exists an allele in the population whose 
frequency is in this range, is, for small 5x, equal to 9x~ 1 (l — x) e ~ 1 5x. 

The frequency spectrum can be used to arrive at further results 
reached more laboriously by discrete distribution methods. Thus, 
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for example, 



Prob{only one allele observed in a sample of n genes} 
1 

« 9 J x n {x- 1 (l-x) 9 - 1 }dx 

l)\/((l + 9)(2 + 9) ••• (n-l + 9)) 



o 

= (n - 



and this agrees with the expression in (88) with the notational 
change of n to %. More complex formulas such as (93) can be re- 
derived using multivariate frequency spectra, but we do not pursue 
the details. 



The form of the frequency spectrum also shows that when 9 is 
small, the most likely situation to arise at any time is that where 
one allele has a high frequency and the remaining alleles are all at 
a low frequency. This occurs for two reasons. The first of these is 
historical: Different alleles enter the population an different times, 
and an "older" allele has had more time to reach a high frequency 
than a "younger" allele. Second, imbalances in allelic frequencies 
arise through stochastic fluctuations, as in the K-allele model as 
discussed below (81). This imbalance agrees qualitatively with that 
found surrounding (81) for the ^-allele model. 



A final result obtained from the frequency spectrum is the fol- 
lowing. Practical population geneticists have long been interested in 
the degree of genetic variation present in a population. In practice 
there will almost always be some variation, so that in practice what 
is meant is "non-trivial variation", or "non-trivial polymorphism." 
The classic definition of such a polymorphism, given by Harris (1980, 
p. 331), is that a locus is polymorphic if the population frequency 
of the most frequent allele in the population of interest is no more 
than 0.99. Thus in this sense a population is not polymorphic if the 
frequency of any allele exceeds 0.99. From the frequency spectrum, 
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the probability of polymorphism is 



i 



1-9 




xf^dx 



(106) 



0.99 



1 




xf- 1 dx 



0.99 



1 - (0.01)°. 



For 9 = 0.1, for example, this probability is only about 0.37. How- 
ever, for larger values of 9, for example for 6 > 1, this probability 
exceeds 0.99. 

Several results for the infinitely many alleles model can be ob- 
tained directly from two-allele theory. For example, we may wish 
to find the mean number of generations until all alleles currently 
existing in the population have been replaced by new alleles, not 
currently existing in the population. This may be found from two- 
allele theory by identifying all currently existing alleles with the 
allele A±, initially having current frequency p = 1 in the population, 
and seeking the mean number of generations until loss of this allele. 
This expression is given in (43), or more accurately in (44). 

Although the theory is by no means clear, it is plausible that to a 
first approximation, all the results given in this section continue to 
apply in more complicated Wright-Fisher models, involving perhaps 
two sexes or geographical structure, if the parameter 9 is defined as 



where N e is one or other version of the effective population size (a 
concept that is discussed later). 

The Cannings Infinitely Many Alleles Model 

The reproductive mechanism in the nonoverlapping generations Can- 
nings infinitely many alleles model follows that of the general prin- 
ciples of the Cannings two-allele model. That is, the model allows 
any reproductive scheme consistent with the exchangeability and 
symmetry properties of the two-allele model. The mean number of 
offspring genes from any "parental" gene is 1, and the variance of 



9 = 4N e u, 



(107) 



3G 



the number of offspring genes is a 2 , necessarily the same for each 
parental gene. The mutation assumptions are as described above, 
in particular that all mutant offspring genes are assumed to be of 
novel allelic types. 

Many of the results of the Wright-Fisher infinitely many alleles 
model apply for the Cannings model, at least to a close approxi- 
mation, provided that the parameter 9, arising in many for- 
mulas associated with the Wright-Fisher model, is replaced 
throughout by 9 jo 2 . Therefore we do not explore the Cannings 
model further, and instead use Wright-Fisher formulae, with this 
change of definition of 9, to apply for the Cannings model. 



The Moran Infinitely Many Alleles Model 

The Moran infinitely many alleles model is the natural extension 
to the infinitely many alleles case of the Moran two alleles model. 
Haploid individuals, which we may identify with genes, are created 
and lost through a birth and death process, as in the two-alleles case, 
with the standard the infinitely many alleles model assumptions that 
an offspring gene is a mutant with probability u and that any new 
mutant is of an entirely novel allelic type. 

The stochastic behavior of the frequency of any allelic type in the 
population is then governed by (71), implying (as for the Wright- 
Fisher and Cannings models) that there can be no concept of sta- 
tionarity of the frequency of any nominated allelic type. On the 
other hand there will exist a concept of the stationary distribution 
of allelic configurations. The possible configurations of the process 
are the same as those for those models, but for the Moran model 
an exact population probability can be given for each configura- 
tion. Suppose that f3j (j = 1,2, ...,27V) is the number of allelic 
types with exactly j representative genes in the population, so that 
= 2N. The quantity f3j is the population analogue of the 
sample number aij in (93). The exact stationary distribution of the 
population configuration process is 

(27V)! # E/3j 

Prob(A,/? 2 , . . .,[3 2N ) = ^ . . . {2N)fhN . . . M SMe) ■ 

(108) 

Here Sj(-) is defined below (93) and 9 is defined for this model by 

9 = 2Nu/(l-u). (109) 
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This is a different definition of 9 than that applying for 
the Wright Fisher model, and is always to be used as the 
definition of 9 when referring to the Moran model. 

The expression (108) is of exactly the same form as (93), with 
n replaced by 2N and atj by (3j. Thus several of the calculations 
arising from (93) are exact for the Moran population process. For 
example, the distribution of the number K 2 n of allelic types in the 
population is given exactly by (94), with n replaced by 2N. Thus, 
immediately from (94), the probability that K 2 n = 1 is, exactly, 

( 2N -^ (HQ) 

The mean of K 2 n is given by (95), with n replaced by 2N and 9 
defined by (109), and the variance K 2 n is 

27V- 1 

var(i^v) = 9 £ (111) 

An exact expression is available for the Moran model (discrete) 
frequency spectrum, for which (105) gives the approximate Wright- 
Fisher model formula. To find this we consider first the "two-allele" 
model (71). In the infinitely many alleles case we think of A\ as 
a new arisen allele formed by mutation and A 2 as all other alleles. 
Standard theory can be used to find the mean number /x(T) of birth 
and death events before the certain loss of A\ from the population. 
This is 

The form of ergodic argument that led to (102) shows that at sta- 
tionarity, the mean of the number K 2 n of different allelic types 
represented in the population is u/i(T), which is 

where here and throughout we use the standard gamma function 
definition 

(M\ _ r(M + 1) M(M - 1) • • • (M - m + 1) 

\m) ~ m\r(M - m + 1) ~ m\ 
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for non-integer M. The expression (113) provides the further infor- 
mation that the typical jth term gives the stationary mean number 
of alleles arising with j representing genes in the population at any 
time. In other words, the exact frequency spectrum for the Moran 
model is 

^((7)r/-y> <-> 

A standard asymptotic formula for the gamma function for large N 
shows the parallel between this exact expression with the diffusion 
theory frequency spectrum (105). 

Various special cases of (114) are of interest. For example, when 
9 = 1, (114) simplifies to j" 1 , in which form the parallel with the 
Wright-Fisher approximation (105) is obvious. However, the differ- 
ent formulae for 9 for the two models should be kept in mind when 
this comparison is made. 

Two of the above expressions are of independent interest. First, 
the expression (112) has the further interpretation that its typical 
term is the mean number of birth-and-death events for which there 
are exactly j copies of the allele in question before its loss from the 
population. It is interesting to evaluate the expression in (112) for 
specific values of 9. When 9 = 2, it is about 27Vlog(2iV) birth and 
death events, or about log(2iV) "generations". The corresponding 
approximation for the Wright-Fisher model, found from (39), is also 
log(2iV) generations, but this formal equality is misleading because 
of the different definitions of 9 in the two cases. 

Second, the expression (113) simplifies to 

9 9 9 9 

9 + 9 + 1 + 9 + 2 + " ' + + 2N-1' 

This is identical to the expression given in (95), with n replaced by 
2N, as we know it must be. 

Many further exact results for the Moran model are available. 
Here are several. 

First, if at any time there is only one allele in the population, 
we say that that allele is "quasi-fixed" in the population. (We do 
not use the expression "fixed", since in an infinitely many alleles 
model this allele will eventually be lost from the population.) The 
probability that a new mutant eventually becomes quasi-fixed can 
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be found as follows. Call the allelic type of the new mutant A\ 
and group together all other genes as U A 2 " genes. Then standard 
continuant Markov chain theory shows that the probability that a 
new mutant allele eventually becomes quasi-fixed in the population 
is C _1 , where 

-frrxmr- <-•> 

This is a different probability than the probability that, at any 
specified time, the population is "quasi-fixed" for one or other allele. 
This latter probability is given by the j = 2N term in the exact 
Moran frequency spectrum (114), namely 

» ,f2N + .,, 



2N V \ 2N 
or, more simply, 

(27V- 1)! 



(l + 0)(2 + 0)...(2iV-l + 0)" 



;ii7) 



To illustrate the difference between the probability defined by (115) 
and the probability defined by (117), when 9 = 1 the former proba- 
bility is 

1/11 1 \-l / X 

2iV< 1+ 2 + 3 + "' + 2iV> <" 8 > 

while the latter probability is 

(Parenthetically, no exact probability for quasi-fixation of one or 
other allele is known for the Wright-Fisher model. The most accu- 
rate approximation available is that of Watterson (1975), namely 

exp(-O.lOO30)r(l + 9){2N)- 6 . (119) 

This expression gives numerical values quite close to those given by 
(117) for a wide range of values of 9, although the different defi- 
nitions of 9 for Moran and Wright-Fisher models must be kept in 
mind when making this comparison.) 

Second, it is immediate that the probability that a gene drawn at 
random from the population is of an allelic type represented j times 
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in the population is found by multiplying the expression in (114) by 
j/(2N). This gives 

^((7)r/T) ™ 

for this probability. We check that the sum of this expression over 
j = l,2,...,2N isl. 

Third, (113) allows an exact calculation of the probability of 
population polymorphism, as defined by Harris. Any allele having 
a frequency exceeding 0.99 must be the most frequent allele in the 
population, and at most one allele can have such a frequency. Thus 
the probability that the most frequent allele in the population has 
frequency exceeding 0.99 is the mean number of alleles with fre- 
quency exceeding 0.99. Taking 0.99(2iV) as an integer M, (113) 
shows that the Harris probability of polymorphism is 




1 



j=M+l 



(121) 



This is close to 1 — (0.01) e , the approximate value found above 
for the Wright-Fisher model using a diffusion approximation. As 
with other such comparisons, this apparent similarity is misleading 
because of the different definitions of 9 in the two models. 

The final result concerns the mean number of birth and death 
events until all alleles present in the population at any time are lost. 
This is the Moran model analogue (once an adjustment is made 
between generations and birth-death events) for the mean number 
of generations until allele current alleles in a population are lost in 
the Wright-Fisher model, an approximation for which is given in 
(43), or more accurately in (44). In the case of the Moran model 
an exact calculation is available, namely that the required mean 
number of birth and death events is 



2N(2N + 0){0- l)- 1 J' 1 - ^ ( 



2N\ [2N + 6- IV 1 ' 

3 



(122) 

(A formula different from (122), found by applying l'Hopital's rule, 
applies for the case 6 = 1.) In the case = 2, the expression (122) 



41 



gives, exactly, 8N 2 (N + 1)/(2N + 1), or about AN 2 , birth and death 
events. This can be thought of as corresponding to 4N "genera- 
tions", which appears to agree closely with the Wright-Fisher ap- 
proximation in (45). This agreement is, however, misleading, since 
the definitions of 6 differ in the two models. 

There are several further comments to make about (122). First, 
The typical (jth) term in (122) is the mean number of birth and 
death events for which there are exactly j genes present of the var- 
ious original alleles in the population before the eventual loss of all 
these alleles. Thus the expression (122) gives more information than 
might otherwise be thought. 

Second, although the identity is not immediately obvious, the 
expression in (122) is identical to the expression 

2N 

2N(2N + e)J2 j{j + e _ 1 y (123) 

We shall see later that the individual terms in the sum also have an 
important interpretation, in this case concerning the past history of 
the population rather than its future evolution. 

Third, recalling the definition 9 = 2Nu/(l — u) for the Moran 
model, the expression in (123) may be written equivalently as 

2N 

g^' (124) 

where 

V J- 2N ' W i~ (2N) 2 ■ {L2b) 

We shall later see why expressions of the form defined by (124) and 
(125) arise. 

Further exact Moran model results, relating to "time" and "age" 
properties, will be discussed later. 



Complications and the effective population size 
Introduction 

All the theory described above (and also that described later) makes 
a large number of assumptions, genetical, modelling and demo- 
graphic. The main genetical assumption is that there is no selection 
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involved between the alleles that we consider. Clearly, and espe- 
cially in light of the Darwinian paradigm, this means that a very 
large proportion of population genetics theory, that relating to se- 
lection, is not considered. Another important aspect of reality that 
is ignored is the existence, for the great majority of species of in- 
terest to us, of two sexes and the diploid nature of the individuals 
in those populations. From the modelling point of view, the three 
models considered (Wright-Fisher, Cannings and Moran) cannot be 
expected to describe accurately any real-life population, even though 
they do provide some insights into the evolutionary genetic behav- 
ior of real populations. Finally, many demographic features, such as 
the geographical dispersion of a population and changes over time 
in the size of the population of interest, have been ignored. 

The concept of the effective population size is meant to address 
some of these deficiencies, and in this section we define this concept 
and examine some of its properties. 



Three concepts of the effective population size 

Even though the Wright-Fisher model is less plausible than several 
other available models as a description of biological reality, it has, 
perhaps for historical reasons, assumed a central place in population 
genetics theory. This model has three properties that relate to the 
population size: 

(i) its maximum non-unit eigenvalue = 1 — (2N) -1 , 

(ii) the probability that two genes taken at random are descendants 
of the same parent gene = (2N) -1 , 

(iii) vai{x(t + 1) | x(t)} = x(t){l - x(t)}/(2N), where x(t) is the 
fraction of A\ genes in generation t. 

In view of these properties it is perhaps natural, if the Wright- 
Fisher model (6) is to be used as a standard, to define the effective 
population size in diploid models that are more complicated and 
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realistic then (6) in the following way: 

= eigenvalue effective population size = |(1 — A max )ti?6) 
= inbreeding effective population size = (2tt 2 )~ 1 , (127) 
= variance effective population size 
x(t){l-x(t)} 
2 var{x(t + 1) | x(t)} ' 1 ' 

Here A max is the largest nonunit eigenvalue of the transition matrix 
of the model considered and rr 2 is the probability, in this model, that 
two genes taken at random in any generation are descendants of the 
same parent gene. Similarly, var{x(t+l)} is the conditional variance 
of the frequency of A\ in generation t + 1 in the more complicated 
model, given the value of this frequency in generation t. 

A fourth concept of effective population size, namely the mu- 
tation effective size, is also possible, but we do not consider this 
concept here. 



Application to the Cannings model 

In this section we consider the application of the effective population 
size concept for the Cannings model, and limit attention for the 
moment to those versions of the model where generations do not 
overlap. Equations (54) and (126) show immediately that for these 
models, the eigenvalue effective population size Ne^ is given by 

N^ = {N-\)/a\ (129) 

where a 2 is the variance in the number of offspring genes from any 
given gene. Equations (56) and (128) show that the variance effec- 
tive population size is given by 

jVf) = (TV-l)/^. (130) 

A value for can be found in the following way. Suppose that 
the ith gene in generation t leaves rrii offspring genes in generation 
t + 1, (Yl rrii = 2N). Then the probability, given mi, . . . , m 2 Ar, that 
two genes drawn at random in generation t + 1 are descendants of 
the same gene is 

27V 

rrn(mi - 1)/{2N(2N - 1)}. (131) 
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The probability 7T2 in (127) is the expected value of this random 
variable. Now mi has mean unity and variance a 2 , so that, on 
taking expectations, ir 2 = a 2 /(2N — 1). From this, 

Nf = {N-\)/a\ (132) 

It follows from these various equations that for the Cannings model, 
all three effective population sizes are equal. 

One application of this conclusion is the following. If leading 
terms only are retained, all three definitions of the effective pop- 
ulation size in the Cannings model are N/a 2 . From the remarks 
surrounding (107), it is plausible that the various Wright-Fisher 
infinitely many alleles model results apply for the nonoverlapping 
generation Cannings model if 9 is defined wherever it occurs by 
AN e u. That is, to a close approximation, we define 9 for the Can- 
nings model by 

9 = ANu/a 2 . (133) 

As stated earlier, the definition of 9 given in (133) is to be used 
whenever the Cannings model is discussed. 

Application to the Moran model 

The three definitions of the effective population size given above are 
not appropriate for models where generations overlap. If we write 
N e for any one of the effective population sizes defined in (126)- 
(128), it seems reasonable for such models to define the effective 
population size as N e k/(2N), where k is the number of individuals 
to die each time unit. Since k = 2N for models where generations 
do not overlap, this leaves (126)-(128) unchanged for such models. 
For the Moran model, where k — 1, this convention yields 

N (e) = N (i) = N (v) = l N _ ( 134 ) 

The equations show that the effective population size in the Moran 
model is half that in the Wright-Fisher model. We now discuss the 
reason for this. 

Arguments parallel to those leading to (24) show that if two al- 
leles Ai and A 2 are allowed in the population, the mean time until 
fixation of one or other allele in the Cannings model is 

tip) w -{AN -2){p\ogp + (1 -p)log(l -p)}/(J 2 , (135) 
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where p is the initial frequency of A\ and o 2 is defined above. This 
formula explains the factor of 2 discussed after equation (74). In 
the Wright-Fisher model tr 2 « 1 while in the Moran model a 2 ~ 
2/ (2N). Setting aside the factor 2N as explained by the conversion 
from generations to birth and death events, it is clear that the crucial 
factor is the difference between the two models in the variance in 
offspring distribution. 



Diploid organisms 

So far we have ignored the diploid nature of most organisms of 
interest, and we now consider a definition of effective population 
size for the diploid case. We do this here for a Cannings model, and 
devise an inbreeding effective population number that allows for the 
diploid nature of the organisms in the population. This number will 
be denoted Ne %d \ and is defined as the reciprocal of the probability 
that two genes taken at random in generation t + 1 are descended 
from the same individual in generation t. This is tantamount, in 
the Cannings model, to selecting two genes at random in generation 
t and asking whether the two genes drawn at random in generation 
t + 1 are both descended from one or other or both of these. In the 
notation of (131), the probability of this event can be written as the 
expected value of 

TV 

^(m, + m N+i )( mi + m N+i - 1)/{2N{2N - 1)}. (136) 
i=i 

It is not hard to see this leads to 

N {id) = 4iV ~ 2 (m) 

where a\ is the variance of the number of offspring genes from each 
(diploid) individual. It is therefore necessary to extend the Cannings 
model to the diploid case. We define a diploid Cannings model as 
one for which the concept of exchangeability relates to monoecious 
diploid individuals. We also assume that the gene transmitted by 
any individual to any offspring is equally likely to be each of the two 
genes in that individual, is independent of the gene(s) transmitted 
by this individual to any other offspring, and is also independent of 
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the genes transmitted by any other individual. With these conven- 
tions it can be shown that 

- 2 = (138) 

where a 2 is the Cannings model gene "offspring number" variance, 
and from this it follows that the expressions in (132) and (137) are 
identical. 



More realistic Wright-Fisher models 

We turn next to the second class of models where a definition of ef- 
fective population size is useful, namely those Wright-Fisher models 
which attempt to incorporate biological complexity more than does 
the simple Wright-Fisher model (6). 

The first model considered allows for the existence of two sexes. 
Suppose in any generation there are Ni diploid males and N 2 diploid 
females, with N ± + N 2 — N. The model assumes that the genetic 
make-up of each individual in the daughter generation is found by 
drawing one gene at random, with replacement, from the male pool 
of genes, and similarly one gene with replacement from the female 
pool. If Xi(t) represents the number of A\ genes among males in 
generation t and X 2 (t) the corresponding number among females, 
then Xi(t + 1) can be represented in the form 

X 1 (t + l) = i(t + l)+j(t + l), (139) 

where i(t+l) has a binomial distribution with parameter X\(t) / (2N\) 
and index Ni, and j(t + 1) has a binomial distribution with pa- 
rameter X 2 (t)/(2N 2 ) and index N\. A similar remark applies to 
X 2 (t + 1), where now the index is N 2 rather than Ni. Evidently the 
pair {Xi(t), X 2 (t)} is Markovian, and there will exist a transition 
matrix whose leading nonunit eigenvalue we require to find so that 
we can calculate N*f\ 

To do this it is necessary to find some function Y(X 1 ,X 2 ) which 
is zero in the absorbing states of the system, positive otherwise, and 
for which 

E\Y{X 1 (t + l),X 2 (t + l)} | X 1 (t),X 2 (t)} = \Y(t) (140) 

for some constant A. Such a function always exists, but some trial 
and error is usually necessary to find it. In the present case it is 
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found, after much labor, that a suitable function is 

Y(X 1 ,X 2 ) = lC{X 1 (2N 1 -X 1 )(2N 1 )- 2 + X 2 (2N 2 -X 2 )(2N 2 )- 2 } 
+ {i _ ( Xl - N 1 )(X 2 - 7V 2 )7V 1 - 1 7V- 1 }, (141) 

where 

C = i{l + (1 - 2^ - 2iV 2 - 1 ) 1 / 2 }. 
With this definition the eigenvalue A becomes 

A = \[l - (AN^- 1 - {AN^ 1 + {1 + N^AN^y 2 } 1 ' 2 }, (142) 

or approximately 

A w 1 - {N x + iV 2 )(8A^ 1 iV 2 )- 1 . (143) 
From this result and (126) it follows that to a close approximation, 

=AN 1 N 2 N-\ (144) 

If JVi = N 2 (= §7V), then iV e (e) w JV, as we might expect, while if 

Ni is very small and N 2 is large, A^i 6 ^ ~ ANi. This latter value is 
sometimes of use in certain animal-breeding programs. 

The inbreeding population size is found much more readily. Two 
genes taken at random in any generation will have identical parent 
genes if both are descended from the same "male" gene or both from 
the same "female" gene. The probability of identical parentage is 
thus 

^2 = i^ r {(2iV 1 )- 1 + (2iV 2 )- 1 }, 

and from this it follows that 

= (27T2)" 1 w 4iV 1 iV 2 iV- 1 . (145) 

The variance effective population size cannot be found so readily, 
and indeed strictly it is impossible to use (128) to find such a quan- 
tity, since an equation of this form does not exist in the two-sex case 
we consider. The fraction of A\ genes is not a Markovian variable 
and in particular, using the notation of (128), the variance of x(t+l) 
cannot be given in terms of x(t) alone. This indicates a real defi- 
ciency in this mode of definition of effective population size. On the 
other hand, sometimes there exists a "quasi-Markovian" variable 
exists in terms of which a generalized expression for the variance 
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effective population size may be defined. In the present case the 
weighted fraction of Ai genes, defined as 

x(t) = X 1 (t)/(AN 1 ) + X 2 (t)/(AN 2 ) 

has the required quasi-Markovian properties, and 

var{:r(t + 1) | x(t)} = x(t){l - x(t)}N(8N 1 N 2 )' 1 + 0(N{ 2 , N 2 2 ). 

From this a generalized variance effective population size may be 
defined, in conjunction with (128), as 

N( v) = AN 1 N 2 N~ 1 . (146) 

Thus for this model, m ~ Ne V \ although strict equality 
does not hold for any of these relations. 

We return now to the case of a monoecious population and con- 
sider complications due to geographical structure. A simplified 
model for this situation which, despite its obvious biological un- 
reality, is useful in revealing the effect of population subdivision, 
has been given by Moran (1962). 

It is supposed that the total population, of size N(H + 1), is sub- 
divided into H + 1 sub-populations each of size N, and that in each 
generation K genes chosen at random migrate from subpopulation 
i to subpopulation j for all i, j (i ^ j). Suppose that in subpopu- 
lation % there are X^t) A\ genes in generation t. There is no single 
Markovian variable describing the behavior of the total population, 
but the quantities Xi(t) are jointly Markovian, and to find Ne^ it 
is necessary to find some function Y(t) = Y{Xi(t), . . . , X H+ i(t)} 
obeying an equation parallel to (140). It is found, after some trial 
and error, that a suitable function Y(t) is 

Y(t) = [A- D + {(A- D) 2 + 4BC} 1/2 } ^ Xi(t){2N - X^t)} 

% 

+ 2fl££*(*){o v -*;(*)}, (147) 



where 



A = (AN 2 + H 2 K 2 + K 2 H — 2N — ANKH)/AN 2 , 

B = (AKN - K 2 H - K 2 )/(AN 2 ), 

C = (AHKN — K 2 H 2 — K 2 H)/(AN 2 ), 

D = (AN 2 + HK 2 + K 2 -AHK)/(AN 2 ). 
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With this definition of Y(t), the eigenvalue A satisfying 



E{Y(t + l) \X 1 (t),.. 



,X H+1 (t)} = \Y(t) 



is 



A = \ (A + D + {{A - D f + ABC} 1 ' 2 ) . 



(148) 



If small-order terms are ignored, this yields eventually 



« N(H + 1){1 + (2K(H + l))}- 1 } 



(149) 



for large H and K. This equation is in fact accurate to within 10% 
even for H = K = 1 , and it thus reveals that population subdivision 
leads to only a slight increase in the eigenvalue effective population 
size compared to the value N(H + 1) obtaining with no subdivision. 

The inbreeding effective population size can be found most 
efficiently by noting that it is independent of K, since the act of 
migration is irrelevant to the computation of its numerical value. 
Thus immediately from (132) 



since each gene produces a number of offspring according to a bino- 
mial distribution with index 2N and parameter (2iV) _1 . This value 
clearly differs only trivially from the true population size N(H + 1) 
and, for small H and K, it differs slightly from Ne^ . 

Because of these two results, one may be tempted to ignore geo- 
graphical sub-division in modelling evolutionary population genetic 
processes. 

The computation of is beset with substantial difficulties 
since there exists no scalar Markovian variable for the model. In- 
deed, unless migration rates are of a large order of magnitude, there 
is not even a "quasi-Markovian" variable. Because of this no satis- 
factory value for jvj^ has yet been put forward for the geographical 
structure case. 

We consider finally a population whose size assumes cyclically 
the sequence of values Ni, N2, N3, . . . , Nk, N±, N2, . . . . There is no 
unique value of Ne e \ iV"j^ or in this case, and it is convenient 
to extend our previous definition to cover k consecutive generations 
of the process. If the population size in generation t + k is JVj, it is 
easy to see that if X(t) is the number of A 1 genes in generation t, 



N^ = {N(H + l)-l}/{l-(2N)- 1 }, 



(150) 
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and in each generation reproduction occurs according to the model 
(6), 

k 

E[X(t+k){2N~X(t+k)} | X(t)} = X(t){2N l -X(t)}l[{l-(2N l )- 1 }. 

i=l 

Defining now by the equation 

k 

{i - (2N^- 1 } k = n^ 1 - (^r 1 }, 

i=l 

it is clear that if k is small and the TVj large, 

iV e (e) w k{N{ x + ■■■ + N^ 1 }- 1 . (151) 

Thus the eigenvalue effective population size is effectively the har- 
monic mean of the various population sizes taken during the k- 
generation cycle. A parallel formula holds for Ne \ although here 
it is easier to work through the probability Q(t + k) that two genes 
in generation t + k do not have the same ancestor in generation t. 
Clearly 

Q(t + k) = {1 - {2N^ l )- 1 }Q{t + k-l), 
and iteration over k generations gives 

k 

Q(t + k) = ]J{l-(2N i )- 1 }Q(t). 

i=i 

Elementary calculations now show that is also essentially equal 
to the harmonic mean of the various population sizes. Again, if x(t) 
is the fraction of A\ genes in generation t, 

vai{x{t+k) | x(t)} = lk{N^+N^+- ■ ■+N^ 1 }x(t){l-x(t)}+0{N- 2 ). 

This shows that to a suitable approximation, ivi^ is also the har- 
monic mean of the various population sizes. 

We conclude this section by noting that many problems exist 
with the concept of the effective population size. Perhaps the most 
notable is the following. The expression "effective population size" 
is widely used in areas associated with population genetics, espe- 
cially in connection with the evolution of the human population, 
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by authors who appear to have no idea of its intimate connection 
to the Wright-Fisher model or of the fact that different concepts of 
the effective population size exist. The numerical values given by 
these different concepts can differ widely for a population whose size 
increases with time, as with the human population, so that many 
dubious claims about the "effective size" of the human population 
at some given time in the past exist in the literature. 

Molecular population genetics 
Introduction 

The infinitely many alleles model was inspired by the knowledge 
of the gene as a sequence of nucleotides. There are four possible 
nucleotides at each site in this sequence, a, g, c and t, and an "allele" 
is simply one specific sequence, such as tccgagtgcat...tc. In a typical 
gene, consisting of a sequence of 3000 nucleotides, there are 4 3000 
possible sequences, that is 4 3000 possible alleles. For essentially all 
practical purposes we may take this number as infinity, thus leading 
to the infinitely many alleles model. Thus this model is one of 
molecular population genetics, since it is inspired by knowledge of 
the molecular nature of the gene. 

Another model inspired by this knowledge is the "infinitely many 
sites" model, described in detail in these lectures by Dr Griffiths. 
However, some aspects of this model are discussed below. 

There are three points where the mathematical population genet- 
ics theory based on nucleotide frequencies differs from the classical 
theory based on gene frequencies. First, the molecular theory is 
dynamic, in contrast to the often static classical theory. Mutations 
are usually seen as leading to new allelic types rather than back to 
currently or previously existing types, since it is plausible that most 
nucleotide mutations will lead to sequences not currently existing in 
the population. Both the infinitely many alleles and the infinitely 
many sites models were originally proposed with this view in mind. 

Second, while the classical theory concerns the evolution of genes 
given labels U A 1 " , U A 2 " , etc., at the molecular level the actual ge- 
netic material is known, so that the symbols a, g, c, and t refer 
to specific rather than type entities. The fact that the theory thus 
concerns ultimate and real entities is of great importance. Perhaps 
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the most important point derives from the fact that molecular con- 
siderations often lead to retrospective rather than prospective evolu- 
tionary questions. Classical population genetics theory was largely 
prospective: Given reasonable numerical values for various genetic 
parameters, the main aim was to show that evolution as a genetic 
process could and would occur. A hundred years ago such an under- 
taking was required. It is, however, no longer necessary to do this, 
and it now appears more useful to attempt to describe the course 
that evolution has taken by a retrospective analysis, and thus to 
gain empirical insight into evolutionary questions. This change of 
viewpoint has also led to the introduction of statistical methods 
for analyzing current genetical data, discussed below. The current 
emphasis on statistical inference procedures is perhaps the most im- 
portant new direction in the theory in recent times. Knowledge of 
the actual genetic material is essential for these inferences, and the 
entire retrospective analysis must therefore be carried out in the 
framework of molecular population genetics. 

Finally, most of the retrospective theory is inferential, and for 
reasons of practicality relies for the inferences made on the data 
available from a sample of genes taken from the population of inter- 
est, rather than from data from the entire population. We denote 
throughout the number of genes in this sample by n. Almost all 
results given below relate to such a sample. 

We consider first the Wright-Fisher infinitely many alleles model. 
The properties of a sample of n genes under this model are best 
summarized through the (approximating) partition formula (93). 
This leads to the distribution of the number K n of different allelic 
types observed in the sample as given in (94) and thus to the mean 
of K n as given by (95). 

There is currently much interest in estimating the parameter 6. 
Equations (93) and (94) show jointly that the conditional distribu- 
tion of the vector A = (Ai,A 2 ,. . . ,A n ) defined before (93), given 
the value of K n , is 

n' 

Prob{A = a\K n = k} = -- - : — , (152) 

where a = (ai, a 2 , ■ ■ ■ , a n ). Equation (152) implies that K n is a suf- 
ficient statistic for 9. Standard statistical theory then shows that 
once the observed value k n of K n is given, no further information 
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about 9 is provided by the various Oj values, so that all inferences 
about 9 should be carried out using the observed value k n of K n only. 
The main inferential procedure deriving from (152) is the testy of 
hypothesis that the alleles in the sample are selectively equivalent. 
The fact that, in practice, 9 is unknown does not matter for such 
inferences, since the conditional distribution (152) is independent of 
9, and forms the "null hypothesis" distribution of the allelic parti- 
tion. Tests of the neutrality hypothesis are discussed in a separate 
section below. 

It is also possible to use (94) to estimate 9 or more generally 
any function of 9. Since K n is a sufficient statistic for 9 we can use 
the probability distribution in (94) directly to find the maximum 
likelihood estimator 9 K of 9. It is found that this estimator is the 
implicit solution of the equation 

K n = § ^ + + J^ + ...+ ; §K . (153) 

9 K 9 k + 1 9 k + 2 9 K + n-l 

Given the observed value k n of K n , the corresponding maximum 
likelihood estimate 9k of 9 is found by solving the equation 

k n = ^ + J*- + ...+ ; ^ ■ (154) 

e k e k + i e k + 2 9 k + n-i 

Numerical calculation of the estimate 9 k using (154) is usually nec- 
essary. 

The estimator implied by (153) is biased, and it is easy to show 
that there can be no unbiased estimator of 9. On the other hand, 
there exists an unbiased estimator of the population homozygosity 
probability 1/(1 + 9). If this estimator is denoted by g(K n ), (94) 
shows that 

^ \S k n \9 k g(k) _ 1 
ti S n (9) 1 + 9' 

where \S^\ is the absolute value of a Stirling number, defined below 
(94). From this, 

n 

\S k n \e k g{k) =9(9 + 2)(9 + 3)---(9 + n-l). 

k=i 
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Since this is an identity for all 9, the expression for g(k) for any 
observed value k n of K n can be found by comparing the coefficients 
of 9 k on both sides of this equation. In particular, when k n — 2, 

1 2 3 t n-l 

Unbiased estimation of 1/(1 + 0) for values of fc n larger than 2 is 
complicated, and it is then probably more convenient to use instead 
the estimator + where 0# is found from (153), even though 

this estimator is slightly biased. 

Geneticists sometimes prefer to estimate (1 + #)~ 1 by /, defined 
in the notation of (98) by 

nf 



This is a poor estimate in that it uses precisely that part of the data 
that is least informative about (1 + 9)~ x . The estimate of 9 derived 
from /, namely 

fl/ = /- 1 -l, (157) 

is biased and has mean square error approximately six or eight times 
larger than that of 9. 

An approximation for the mean square error (MSE) of the es- 
timator 9k as defined by (153) is found as follows. Writing the 
right-hand side of (153) as ^{9 K ), we have K n = vp(9 K ) and also, 
from (95), E(K n ) =if>(6). Thus by subtraction, 

K n -E(K n )=^j(9 K )-^(9). 

A first-order Taylor series approximation for the right-hand side is 
(9 K - 9)4>'(9), so that 

K n -E(K n )^(9 K - 9)^(9). 
Squaring and taking expectations, we get 

MSEfc) « (158) 

The variance of K n is given in (96), and it is immediate that 
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This leads to 

MSE(^) » — ^ , . (160) 

The approximation (160) appears to be quite accurate. 

Exact sample results for the Moran model can be obtained rapidly, 
since under the Moran infinitely many alleles model, (93) holds ex- 
actly if 9 is defined by (109). This is in contrast to the situation 
for the Wright-Fisher model, where (93) is only an approximation. 
Thus with the Moran model definition of 9, (94), (95), (96), (97), 
and (98) are all exact, as is also the conditional distribution formula 
(152) that derives from them. It is interesting to ask why these 
formulas hold exactly in the Moran model, not only in a sample 
but also in the population, and also why sample formulas and pop- 
ulation formulae are identical, with the replacement of n for 2N. 
Coalescent theory, which we now turn to, explains this fact. 

The coalescent 
Introduction 

The concept that is most frequently used for inferential and other 
purposes in population genetics is the Kingman coalescent (King- 
man (1982)) . In this section a description of the most immediate 
properties of the simple coalescent process is given. Complications 
that in practice must be taken into account (for example changes 
over time in the size of the population under consideration) are not 
discussed. One of the main values of the coalescent is to provide a 
coherent framework within which to view various properties of the 
models considered above. 

Two technical results 

It is convenient to start with two technical results, one of which will 
be relevant for approximations in the coalescent associated with the 
Wright-Fisher model, and by implication the Cannings model, while 
the other will be relevant for exact Moran model calculations. 

We consider first a Poisson process in which events occur inde- 
pendently and randomly in time, with the probability of an event in 
(t, t + St) being aSt. (Here and throughout we ignore terms of order 
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(St) 2 .) We call a the rate of the process. Standard Poisson process 
theory shows that the density function of the (random) time X be- 
tween events, and until the first event, is f(x) — a e~ ax , and thus 
that the mean time until the first event, and also between events, is 
l/a. 

Consider now two such processes, process (a) and process (b), 
with respective rates a and b. From standard Poisson process theory, 
given that an event occurs, the probability that it arises in process 
(a) is ajia + b). The mean number of "process (a)" events to occur 
before the first "process (b)" event occurs is a/b. More generally, 
the probability that j "process (a)" events occur before the first 
"process (b)" event occurs is 



The mean time for the first event to occur under one or the other 
process is l/(a + b). Given that this first event occurs in process (a), 
the conditional mean time until this first event occurs is equal to 
the unconditional mean time, namely l/(a+b). The same conclusion 
applies if the first event occurs in process (b). 

Similar properties hold for the geometric distribution. Consider a 
sequence of independent trials and two events, event A and event B. 
The probability that one of the events A and B occurs at any trial 
is a + b. The events A and B cannot both occur at the same trial, 
and given that one of these events occurs at trial i, the probability 
that it is an A event is a/(a + b). 

Consider now the random number of trials until the first event 
occurs. This random variable has geometric distribution, and takes 
the value i, % — 1, 2, ... , with probability (1 — a — b) l ^ 1 (a + b). The 
mean of this random variable is thus l/(a + b). The probability that 
the first event to occur is an A event is a/ (a + b). Given that the 
first event to occur is an A event, the mean number of trials before 
the event occurs is l/(a + b). In other words, this mean number of 
trials applies whichever event occurs first. The similarity of proper- 
ties between the Poisson process and the geometric distribution is 
evident. 




b 



(161) 



•57 



Approximate results for the Wright-Fisher model - no mu- 
tation 

With the above results in hand, we first describe the general concept 
of the coalescent process. To do this, we consider the ancestry of a 
sample of n genes taken at the present time. Since our interest is in 
the ancestry of these genes, we consider a process moving backward 
in time, and introduce a notation acknowledging this. We consis- 
tently use the notation r for a time in the past before the sample 
was taken, so that if t<i > Ti, then r 2 is further back in the past than 

is T\ . 

We describe the common ancestry of the sample of n genes at 
any time r through the concept of an equivalence class. Two genes 
in the sample of n are in the same equivalence class at time r if 
they have a common ancestor at this time. Equivalence classes are 
denoted by parentheses: Thus if n — 8 and at time r genes 1 and 2 
have one common ancestor, genes 4 and 5 a second, and genes 6 and 
7 a third, and none of the three common ancestors are identical, the 
equivalence classes at time time r are 



Such a time r is shown in Figure 1. 

We call any such set of equivalence classes an equivalence relation, 
and denote any such equivalence relation by a Greek letter. As two 
particular cases, at time r = the equivalence relation is <f>i = 
{(1), (2), (3), (4), (5), (6), (7), (8)}, and at the time of the most recent 
common ancestor of all eight genes, the equivalence relation is (j> n = 
{(1, 2, 3,4, 5, 6, 7, 8)}. The coalescent process is a description of the 
details of the ancestry of the n genes moving from (pi to <f> n . 

Let £ be some equivalence relation, and t] some equivalence rela- 
tions that can be found from £ by amalgamating two of the equiv- 
alence classes in £. Such an amalgamation is called a coalescence, 
and the process of successive such amalgamations is called the co- 
alescence process. It is assumed that, if terms of order (St) 2 are 
ignored, and given that the process is in £ at time r, 




(3), (4,5), (6,7), (8). 



(162) 



Prob (process in r] at time r + St) = St, 
and if j is the number of equivalence classes in £, 

Prob (process in £ at time r + St) = 1 — - — — 




St. 



(163) 



(164) 
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1 2 3 4 5 6 7 



Figure 1: The coalescent 



The above assumptions are clearly approximations for any discrete- 
time process, but they are precisely the assumptions needed for the 
Wright-Fisher approximate coalescent theory. 

The coalescent process defined by (163) and (164)consists of a 
sequence of n — 1 Poisson processes, with respective rates j(j — 
l)/2, j = n, n — 1,...,2, describing the Poisson process rate at 
which two of these classes amalgamate when there are j equivalence 
classes in the coalescent. Thus the rate j(j — l)/2 applies when 
there are j ancestors of the genes in the sample for j < n, with the 
rate n(n — l)/2 applying for the actual sample itself. 

The Poisson process theory outlined above shows that the time 
Tj to move from an ancestry consisting of j genes to one consisting of 
j — 1 genes has an exponential distribution with mean 2/{j(j — 1)}. 
Since the total time required to go back from the contemporary 
sample of genes to their most recent common ancestor is the sum 
of the times required to go from j to j — 1 ancestor genes, j = 
2,3, ... ,n, the mean -E(T MR cas) is> immediately, 

B(WAsH2 g_^ 2 g_l_. (165) 
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This time is 2 — 2/n coalescence time units, and it requires a mul- 
tiplicative scaling factor of 2N to convert to a "generations" ba- 
sis when applied to the Wright-Fisher model. In other words, 
^(^mrcas) = 4iV - AN/n generations. 

It is clear from (165) that about half this mean time relates to 
the final coalescence of two lines of ascent into one. This observation 
gives some idea of the shape of the coalescent tree: The long arms 
tend to arise when there is a very small number of genes in the 
ancestry of the sample. 

The times Tj,j = l,2,...,n — 1, are independent, so that the 
variance of Tmrcas is the sum of the variances of the Tj. Standard 
calculations show that this is approximately 4n 2 /3 — 12, or about 
1.16, (squared) time units. This implies a standard deviation of 
about 2.16 generations. 

The complete distribution of Tmrcas is a l so known (Tavare (2004)). 
However the expression is complicated and we do not reproduce it 
here, other than to note the inequalities 

e"* < Prob (Tmrcas > t) < e~ 3t . 

If the above theory were to apply to the entire population of 
genes in a Wright-Fisher model, the mean -E'(Tmrcap) of the total 
time to arrive at the most recent ancestor gene of all the genes in 
the population (MRCAP) would be found by putting n = 2N, to 

get 

£(T M rcap) = 4A - 2 (166) 

generations. Although coalescent theory does not apply directly 
to the entire population, the mean number of generations given in 
(166) is essentially correct. The reason for this is implicit in an ob- 
servation made above, that the long arms in any coalescent process 
tend to arise when the number of genes in the ancestry of the genes 
considered is small, and for such small numbers the assumptions for 
the coalescent process hold. 

Approximate results for the Wright-Fisher model with mu- 
tation 

We now introduce mutation, and suppose that the probability that 
any gene mutates in the time interval (r + 5t,t) is (9/2)St. All 
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mutants are assumed to be of new allelic types. Following the coa- 
lescent paradigm, we trace back the ancestry of a sample of n genes 
to the mutation forming the oldest allele in the sample. As we go 
backward in time along the coalescent, we shall encounter from time 
to time a "defining event" , taken either as a coalescence of two lines 
of ascent into a common ancestor or a mutation in one or other of 
the lines of ascent. Figure 2 describes such an ancestry, identical to 
that of Figure 1 but with crosses to indicate mutations. 




Figure 2: The coalescent with mutations 



We exclude from further tracing back any line in which a muta- 
tion occurs, since any mutation occurring further back in any such 
line does not appear in the sample. Thus any such line may be 
thought of as stopping at the mutation, as shown in Figure 3 (de- 
scribing the same ancestry as that in Figure 2). 

If at time r there are j ancestors of the n genes in the sample, 
the probability that a defining event occurs in (r, r + St) is 

- 1)St + 1 -jB5t = + 9 - 1)5t, (167) 

the first term on the left-hand side arising from the possibility of a 
coalescence of two lines of ascent, and the second from the possibility 
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1 2 3 4 5 6 7 8 
Figure 3: Tracing back to, and stopping at, mutational events 

of a mutation. 

If a defining event is a coalescence of two lines of ascent, the 
number of lines of ascent clearly decreases by 1. The fact that 
if a defining event arises from a mutation we exclude any further 
tracing back of the line of ascent in which the mutation arose implies 
that the number of lines of ascent also decreases by 1. Thus at 
any defining event the number of lines of ascent considered in the 
tracing back process decreases by 1. Given a defining event leading 
to j genes in the ancestry, the Poisson process theory described 
above shows that, going backward in time, the mean time until the 
next defining event occurs is 2/{j(j + 6 — 1)}, and that the same 
mean time applies when we restrict attention to those defining events 
determined by a mutation. 

Thus starting with the original sample and continuing up the 
ancestry until the mutation forming the oldest allele in the sample 
is reached, we find that the mean age of the oldest allele in the 
sample is 




(168) 
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coalescent time units. The value in (168) must be multiplied by 2N 
to give this mean age in terms of generations. 

The time backward until the mutation forming the oldest allele in 
the sample, whose mean is given in (168), does not necessarily trace 
back to, and past, the most recent common ancestor of the genes 
in the sample (MRCAS), and will do so only if the allelic type of 
the MRCAS is represented in the sample. This observation can be 
put in quantitative terms by comparing the MRCAS given in (165) 
to the expression in (168). For small 9, the age of the oldest allele 
will tend to exceed the time back to the MRCAS, while for large 
9, the converse will tend to be the case. The case 9 = 2 appears 
to be a borderline one: For this value, the expressions in (165) and 
(168) differ only by a term of order n~ 2 '. Thus for this value of 9, 
we expect the oldest allele in the sample to have arisen at about the 
same time as the MRCAS. 

The competing Poisson process theory outlined above shows that, 
given that a defining event occurs with j genes present in the an- 
cestry, the probability that this is a mutation is 9/ (j — 1 + 9). Thus 
the mean number of different allelic types found in the sample is 



and this is the value given in (95). The number of "mutation- 
caused" defining events with j genes present in the ancestry is, of 
course, either or 1, and thus the variance of the number of different 
allelic types found in the sample is 



This expression is easily shown to be identical to the variance for- 
mula (96). 

Even more than this can be said. The probability that exactly k 
of the defining events are "mutation-caused" is clearly proportional 
to 9 k /{9(9 + 1) • • • {9 + n — 1)}, the proportionality factor not de- 
pending on 9. Since this is true for all possible values of 9 and since 
the sum of the probabilities over k — 1, 2, . . . , n must be 1, the prob- 
ability distribution of the number of different alleles in the sample 
must be given by (94). 
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The complete distribution of the allelic configuration in the sam- 
ple as given in (93) is not so simply derived. Kingman (1982), to 
whom coalescent theory is due, employed the full machinery of the 
coalescent process, together with a combinatorial argument consid- 
ering all possible paths from <p n to 0i, to derive (93). That is, (93) 
derives immediately from, and is best thought of as a consequence of, 
the coalescent properties of the ancestry of the genes in the sample. 

The sample contains only one allele if no mutants occurred in the 
coalescent after the original mutation for the oldest allele. Moving 
up the coalescent, this is the probability that all defining events 
before this original mutation is reached are amalgamations of lines 
of ascent rather than mutations. The probability of this is 

ff j = (169) 

f = \{j + e) (l + 0)(2 + 0)...(n-l + 0)' 1 ; 

and this agrees, as it must, with the expression in (97). 



Exact results for the Moran model - no mutation 

We now turn to exact coalescent results for the Moran model. These 
are found in a manner similar to that used above, with the time unit 
used corresponding to the time between one birth and death event 
and the next. 

As we did for the Wright-Fisher model, we first consider the 
coalescent process itself. Here, however, we use a coalescent theory 
that is not only exact, but that also applies for a sample of any 
size, and in particular to the entire population of genes itself. This 
implies that all results deriving from coalescent theory, for example 
the topology of the coalescent tree, are identical to corresponding 
results for the exact Moran model coalescent process. 

It is convenient to think of a gene that does not die in a birth and 
death event as being its own descendant after that event has take 
place. Consider, then, a sample of n genes, where n is not restricted 
to be small and could be any number up to and including the entire 
population size of 2N. As we trace back the ancestry of these n genes 
we will encounter a sequence of coalescent events reducing the size of 
the ancestry ton — l,n — 2,... genes and eventually to one gene, the 
most recent common ancestor of the sample. Suppose that in this 
process we have just reached a time when there are exactly j genes 
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in this ancestry. These will be "descendants" of j — 1 parental genes 
if one of these parents was chosen to reproduce and the offspring is in 
the ancestry of the sample of n genes. The probability of this event 
is j{] - l)/(2iV) 2 . With probability 1 - l)/(2iV) 2 the number of 
ancestors remains at j. It follows that, as we trace back the ancestry 
of the genes, the number Tj of birth and death events between the 
times when there are j ancestor genes and j — 1 ancestor genes has, 
exactly, a geometric distribution with parameter j(j — 1)/ (2iV) 2 and 
thus with mean (2N) 2 /{j(j — 1)}. From this, the mean of the time 
Tmrcas until the most recent common ancestor of all the genes in 
the sample is given by 

£(T MRC as) = £ = (2iV) 2 (l - i) (170) 

birth and death events. In the particular case n = 2N this is 

£(T M rcap) = 2N(2N - 1) (171) 

birth and death events. 

Since the various 7}'s are independent, the variance of Tmrcap is 
the sum of the variances of the 7}'s. This is 

var(T MRCAS ) = ± - ± -p^-. (172) 

The complete distribution of T MRC ap can be found, but the resulting 
expression is complicated and is not given here. 

Exact results for the Moran model with mutation 

We now introduce mutation. Consider again a sample of n genes and 
the sequence of birth and death events that led to the formation of 
this sample. We again trace back the ancestry of the n genes in the 
sample, and consider some birth and death event when this ancestry 
contains j — 1 genes. With probability j/2N the newborn created in 
the population at this birth and death event is in the ancestry of the 
sample, and with probability u is a mutant. That is, the probability 
that at this birth and death event a new mutant gene is added to the 
ancestry of the sample is ju/(2N). As for the Wright-Fisher model, 
we trace back upward along the lines of ascent from the sample, and 
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do not trace back any further any line of ascent at a time when a 
new mutant arises in that line, so that at any mutation, the number 
of lines of ascent that we consider decreases by 1. 

A further decrease can occur from a coalescence for which the 
addition of a newborn to the ancestry of the sample does not produce 
a mutant offspring gene. If at any time there are j lines in the 
ancestry, the probability of a coalescence not arising from a mutant 
newborn is j(j — 1)(1 — u)/(2N) 2 . 

It follows from the above that the number of lines of ascent from 
the sample will decrease from j to j — 1 at some birth and death 
event with total probability 

ju_ j(j - 1)(1 - u) = 2Nju + j(j - 1)(1 - u) 

2N (2N) 2 (2N) 2 ' 1 } 

We write the left-hand side as Vj + Wj , where Vj and Wj are defined 
in (125). The number of birth and death events until a decrease in 
the number of lines of ascent from j to j — 1 follows a geometric 
distribution with parameter Vj + Wj. It follows from the competing 
geometric theory given above that the mean number of birth and 
death events until the number of lines of ascent decreases from j to 
j — 1 is 1/ (vj +Wj), and that this mean applies whatever the reason 
for the decrease. Tracing back to the mutation forming the oldest 
allele in the sample, we see that the mean age of this oldest allele 
is, exactly, 

^— , (174) 

-f Vj + Wj 

where v 3 - and Wj are defined in (125). 

The probability that a decrease in the number of ancestral lines 
from j to j — 1, given that such a decrease occurs, is Vj/(vj + Wj), or, 
using the Moran model definition of 9, more simply as 9/ (j — 1 + 9). 
The mean number of different alleles in the sample is thus, exactly, 

as given by (95). Extending this argument as for the Wright-Fisher 
case, the exact distribution of the number of alleles in the sample is 
found to be given by (94), as expected. 
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The complete distribution of the sample allelic configuration, as 
with the Wright-Fisher model, requires a full description of the co- 
alescent process. 

The argument just used, while expressed as one concerning a 
sample of genes, applies equally for the entire population of genes. 
This occurs because, even in the entire population, at most one coa- 
lescent event can occur at each birth and death event. Thus all the 
exact sample Moran model results found by coalescent arguments 
apply for the population as a whole, with n being replaced by 2N. 
This explains the identity of the form of many exact Moran model 
sample and population formulas. 

"Age" and "frequency" results 

Many elegant results are found when considering the ages of alleles 
and the frequencies of alleles when ordered by their ages. In this 
section we consider both "frequency" and "age" results, starting 
with the former. 

Approximate Wright-Fisher model "frequency" results 

We consider first approximate results applying for the Wright-Fisher 
model. These follow largely from the so-called GEM distribution, 
named for Griffiths, (1980), Engen (1975) and McCloskey (1965), 
who established its salient properties. This distribution can be found 
in the following way. 

Suppose that a gene is taken at random from the population. The 
probability that this gene will be of an allelic type whose frequency 
in the population is x is just x. The frequency spectrum shows im- 
mediately that the (random) frequency of the allele determined by 
this randomly chosen gene is 

f{x) = 0(1 - xf- 1 . (176) 

Suppose now that all genes of the allelic type just chosen are re- 
moved from the population. A second gene is now drawn at random 
from the population and its allelic type observed. The (relative) 
frequency of the allelic type of this gene among the genes remaining 
at this stage is also given by (176). All genes of this second allelic 
type are now also removed from the population. A third gene is 
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then drawn at random from the genes remaining, its allelic type ob- 
served, and all genes of this (third) allelic type removed from the 
population. This process is continued indefinitely. At any stage, the 
distribution of the (relative) frequency of the allelic type of any gene 
just drawn among the genes left when the draw takes place is given 
by (176). This leads to the following representation. Denote by 
Wj the original population frequency of the jth allelic type drawn. 
Then we can write w\ — x±, and for j — 2, 3, . . ., 



where the independent random variables, each having the 

distribution (176). The random vector (wi,w 2 , ■ ■ ■) is then defined 
as having the GEM distribution. 

All the alleles in the population at any time eventually leave the 
population, through the joint processes of mutation and random 
drift, and any allele with current population frequency x survives 
the longest with probability x. That is, the GEM arises when alleles 
are labelled according to the length of their future persistence in 
the population. Reversibility arguments then show that the GEM 
distribution also applies when the alleles in the population are la- 
belled by their age. In other words, the vector (wi,w 2 , ■ ■ .) can be 
thought of as the vector of allelic frequencies when alleles are ordered 
with respect to their ages in the population (with allele 1 being the 
oldest). 

The elegance of many age-ordered formulas derives directly from 
the simplicity and tractability of the GEM distribution. Here are 
two examples. First, the GEM distribution shows immediately that 
the mean population frequency of the oldest allele in the population 
is 



Jo L ^ u 
and more generally that the mean population frequency of the jth 
oldest allele in the population is 



Wj = (1 - Xi){l - X 2 ) ■ • ■ (1 - Xj-i)Xj. 



(177) 




(178) 




Second, the probability that a gene drawn at random from the 
population is of the type of the oldest allele is the mean frequency 
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of the oldest allele, namely 1/(1 + 0), as just shown. More generally, 
the probability that n genes drawn at random from the population 
are all of the type of the oldest allele is 

f l 77! 

9 x n (l -x) e - l dx = — -. -. 

Jo (l + 0)(2 + 0)...(n + 0) 

The probability that n genes drawn at random from the popula- 
tion are all of the same unspecified allelic type is 

6 r - x f~ l dx = T, ^ n 7T' 

Jo (l + 0)(2 + 0)---(n + 0-l)' 

in agreement with (97). From this, given that n genes drawn at 
random are all of the same allelic type, the probability that they 
are all of the allelic type of the oldest allele is n/(n + 9). 

A question of some interest is to find the probability that the old- 
est allele in the population is also the most frequent. By reversibility 
arguments this is also the probability that the most frequent allele 
in the population will survive the longest into the future, and in 
turn this is the mean of the frequency of the most frequent allele. 
Unfortunately, the distribution of the frequency of the most frequent 
allele is not user-friendly. A lower bound for the mean frequency of 
the most frequent allele is (1/2) 61 , which is useful for small but not 
of much value for larger 0, and an upper bound is 1 — 0(1 — 0) log 2. 
When = 1 this mean is 0.624..., which may be compared with the 
mean frequency of the oldest allele (which must be less than the 
mean frequency of the most frequent allele) of 0.5. 

Exact Moran model "frequency" results 

It will be expected that exact results, corresponding to those given 
above for the Wright-Fisher model, hold for the Moran model, with 
9 defined, as always for this model, as 2Nu/{l — u). The first of 
these is an exact representation of the GEM distribution, analogous 
to (177). This has been provided by Hoppe (1987). Denote by 
Ni,N 2 ,... the numbers of genes of the oldest, second-oldest, ... 
alleles in the population. Then Aq, N 2 , . . . can be defined in turn by 

Ni = l + Mi, 2 = 1,2,..., (179) 

where Mj has a binomial distribution with index 2N — Ni — N 2 — 
Ni_x — 1 and parameter x i: where x±, x 2 , ■ ■ ■ are iid continuous 
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random variables each having the density function (176). Eventually 
the sum Ni + 7V 2 + ■ ■ ■ + N k reaches the value 27V and the process 
then stops, the final index k being identical to the number K 2 n of 
alleles in the population. 

It follows directly from this representation that the mean of Ni 

is 

27V + 



1 + (27V -1)0 f x^-xf^dx = 
Jo 



1 + 

The mean of the proportion NJ(2N) is 1/{1 + (27V - 1)«}, which 
is very close to the approximation 1/{1 + 0}. 

If there is only one allele in the population, this allele must be 
the oldest one in the population. The above representation shows 
that the probability that the oldest allele arises 2N times in the 
population is 

Prob (Mi = 2N - 1) = 9 [ x 2N -\l - x) 9 - 1 dx. 

Jo 

This probability also follows immediately from the fact that an allele 
arising 2N times in the population must be the oldest allele, and is 
given more simply (see (110)) as 

(27V -1)! , . 

(180) 



(l + 0)(2 + 0)---(27V-l + 0) 



More generally, Kelly (1977) has shown that the probability that 
the oldest allele is represented by j genes in the population is 



9 /27V\ f2.\ ■ 9 ' 



(181) 



2N\j J\ j 

or more simply, 

0(2iV - 1)(2N - 2) ■ ■ ■ (27V - j + 1) 
(27V + 9 - j)(2N + 9 - j + 1) • • • (27V + 9 - 1) 

The case j = 27V given in (180) is clearly a particular example of 
this, and the mean number (27V + 0)/(l + 9) follows from (182). 



(182) 



Approximate Wright-Fisher model "age" results 

We now turn to "age" questions, considering first approximate Wright- 
Fisher results.. Some for these follow immediately from the previous 
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calculations. For example, the mean time for all the alleles existing 
in the population at any time to leave the population is given in (44), 
and by reversibility this is the mean time, into the past, that the 
oldest of these originally arose by mutation. This is then the mean 
age of the oldest allele in the population, given on a "generations" 
basis. In other words, 

— generations. (183) 

j=1 JU + e - 1 ) 

In the case = 2, this mean age is very close to 4N — 2, that is, to 
the conditional mean fixation time (45). 

If an allele is observed in the population with frequency p, what 
is its mean age? By reversibility, this is the mean time t{p) that it 
persists in the population, and in the Wright-Fisher model approx- 
imation for this is found immediately from (40) as 



This is clearly a generalization of the expression in (183), since if 
p — 1, only one allele arises in the population, and it must then be 
the oldest allele. 

A question whose answer follows from the above calculation is the 
following: If a gene is taken at random from the population, what 
is the diffusion approximation for the mean age of its allelic type? 
With a change of notation, the density function of the frequency 
p of the allelic type of the randomly chosen gene is, from (176), 
f(p) = 8(1 —p) e ~ l . The mean age i(p) of an allele with frequency p 
is, by reversibility, given by (40). The required probability is 



and use of (40) for i(p) shows that this reduces to 2/9 time units, or 
for the Wright-Fisher model, 1/u generations. This conclusion may 
also be derived by looking backward to the past and using coalescent 
arguments. It is also an immediate result. Looking backward to 
the past, the probability that the original mutation creating the 
allelic type of the gene in question occurred j generations in the 




(184) 




(185) 
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past is u(l — uY" 1 , j — 1,2, ... , and the mean of this (geometric) 
distribution is 1/u. 

We turn now to sample properties, which are in practice more 
important than population properties. The most important sam- 
ple distribution concerns the frequencies of the alleles in the sample 
when ordered by age. This distribution was obtained d by Don- 
nelly and Tavare (1986), who found the probability that the num- 
ber K n of alleles in the sample takes the value k, and that the 
age-ordered numbers of these alleles in the sample are (in age order) 
n (i),n(2), ■ ■ ■ ,n(k)- This probability is 



Sn{0)n( k ){n(k) + rc(fc-i)) • • • (n(k) + n^k-i) H ^ n (2)Y 

where Sj(0) is defined below (93). 

Griffiths and Tavare (1998) give the Laplace transform of the 
distribution of the age of an allele observed b times in a sample of 
n genes, together with a limiting Laplace transform for the case 
in which 6 approaches 0. These results show that the diffusion 
approximation for the mean age of such an allele is 



generations, where is defined as = a(a + 1) • • • (a + j — 1). 
This is the sample analogue of the population expression in (184), 
and it converges to (184) as n — > oo with b = np. 

Our final calculation concerns the mean age of the oldest allele 
in the sample. For the Wright-Fisher model this mean age is 



For the case n = 2N this is the value given in (44), and for the case 
n — 1 it reduces to the value 1/u given above. 

Exact Moran model age results 

We now turn to exact Moran model "age" results. The exact re- 
sult corresponding to (183) for the Moran model is given in (122), 




(187) 




(188) 
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or equivalently in (123), being almost exactly 4N 2 birth and death 
events. The exact Moran model calculation corresponding to (184) 
follows from the mean persistence time found eventually using stan- 
dard continuant Markov chain theory and (71). The Moran model 
calculations parallel to those leading from (185) use the exact fre- 
quency spectrum (113) and the exact mean age deriving from (71). 
However, a direct argument parallel to that given for the Wright- 
Fisher model shows that the exact mean time, measured in birth 
and death events, is 2N/u. 

The expression (186) is exact for the Moran model with 9 de- 
fined as 2Nu/(l — u). Several results concerning the oldest allele in 
the sample can be found from this formula, or in some cases more 
directly by other methods. For example, the probability that the 
oldest allele in the sample is represented by j genes in the sample is 



This is identical to the expression (181) if 2N is replaced by n in 
the latter. 

Further exact Moran model results provide connections between 
the oldest allele in the sample and the oldest allele in the population. 
For example, Kelly (1976) showed that in the Moran model, the 
probability that the oldest allele in the population is observed at all 
in the sample is n(2N + 6)/[2N(n + 0)]. This is equal to 1, as it 
must be, when n = 2N, and when n — 1 it reduces to a result found 
above that a randomly selected gene is of the oldest allelic type in 
the population. (The Wright-Fisher model approximation to this 
probability, found by letting N — > oo, is n/(n + 6).) 

A further result is that the probability that a gene seen j times 
in the sample is of the oldest allelic type in the population is j(2N + 
9)/[2N(n + 9)]. (Letting iV — > oo, the Wright-Fisher model approxi- 
mation for this probability is j / (n + 9). When n = j this is j / (j + 9), 
a result found above found by other methods.) 

Donnelly (1986)) provides further formulas extending these. He 
showed, for example, that the probability that the oldest allele in 
the population is observed j times in the sample is 



(Kelly, (1976)) 




(189) 




n^^n + 9 - 1 



-i 



n + 9 \ 



3 = 0,1,2,..., n. 



(190) 
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This is, of course, closely connected to the Kelly result (189). For 
the case j — this probability is 0/(n + 6), confirming the com- 
plementary probability n/(n + 6) found above. Conditional on the 
event that the oldest allele in the population does appear in the 
sample, a straightforward calculation using (190) shows that this 
conditional probability and that in (189) are identical. 

The result corresponding to (188) for the Moran model is 



birth and death events, with (of course) 9 defined as 2Nu/(l — u). 
When n = 1 this reduces to the calculation 2N/u found above. 
When n = 2N it is identical to (123) and, less obviously, to the 
expression given in (122). 

As noted in the discussion following (123), the expression in (191) 
may be written equivalently as 



where Vj and Wj are defined in (125). Coalescent arguments explain 
why the mean age of the oldest allele in a sample can be expressed in 
this form and why the mean age of the oldest allele in the population 
can similarly be expressed in the form defined by (124) and (125). 

Testing neutrality 
Introduction 

Almost all the theory discussed so far, and in particular all of the 
coalescent theory described, assumes selective neutrality at the gene 
locus considered. In this section we consider the question: May we 
in fact reasonably assume selective neutrality at this gene locus? 

The hypothesis of selective neutrality is more frequently called 
the "non-Darwinian" theory, and was promoted mainly by Kimura 
(1968). Under this theory it is claimed that, whereas the gene sub- 
stitutions responsible for obviously adaptive and progressive phe- 
nomena are clearly selective, there exists a further class of gene 
substitutions, perhaps in number far exceeding those directed by 




(191) 




(192) 
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selection, that have occurred purely by chance stochastic processes. 
A better name for the theory would thus be the "extra-Darwinian" 
theory, although here we adhere to the standard expression given 
above. 

In a broader sense, the theory asserts that a large fraction of 
currently observed genetic variation between and within populations 
is nonselective. In this more extreme sense the theory has been 
described as the "neutral alleles" theory, although this term and 
the term "non-Darwinian" have been used interchangeably in the 
literature and will be so used here. 

This theory has, of course, been controversial, not only among 
theoreticians but also among practical geneticists, and the question 
whether certain specific substitutions have been neutral has been 
argued for decades. We do not refer here to the extensive literature 
on this matter. 

In statistical terms the neutral theory is the "null hypothesis" 
to be tested, and all calculations given here assume that this null 
hypothesis is true. Most tests in the current literature relate to 
"infinitely many sites" data: here we consider both these tests and 
those tests that use "infinitely many alleles" data. 



Tests based on the infinitely many alleles model 

The first objective tests of selective neutrality based on the infinitely 
many alleles model were put forward by Ewens (1972) and Watter- 
son (1977). The broad aim of both tests was to assess whether the 
observed values {ai, . . . , a n } in (152) conform reasonably to what is 
expected under neutrality, that is, under the formula (152), given 
the sample size n and the observed number k of alleles in the sample. 
It is equivalent to use the observed numbers {rii, . . . , n^} defined in 
connection with (98) and to assess whether these conform reason- 
ably to their conditional probability given n and k, namely, 

n\ 

Prob(ni,n 2 ,...,n fc |A;) = — — — : . (193) 

|^|fc!nin 2 ■■■n k 

The Ewens and the Watterson testing procedures differ only in 
the test statistic employed, and here we discuss only the (superior) 
Watterson procedure. This uses as test statistic the observed sample 
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homozygosity, defined as 

/ = £ % <m 

3=1 

The first aim is to establish what values of / will lead to rejection 
of the neutral hypothesis. Clearly, / will tend to be smaller under 
selection favoring heterozygotes than under neutrality, since this 
form of selection will tend to equalize allele frequencies compared 
to that expected for the neutral case, thus tending to decrease /. 
If we expect one high-frequency "superior" allele and a collection of 
low-frequency deleterious alleles, / will tend to exceed its neutral 
theory value. Thus the hypothesis of neutrality is rejected if / is 
"too small" and also if / is "too large" . 

To determine how large or small / must be before neutrality is 
rejected, it is necessary to find its neutral theory probability dis- 
tribution. This may be found in principle from (193). In practice, 
difficulties arise with the mathematical calculations because of the 
form of the distribution (193), and other procedures are needed. 

For any observed data set {rii, . . . , n^}, a computer-intensive ex- 
act approach proceeds by taking n and k as given, and summing 
the probabilities in (193) over all those ni, n 2 , . . . ,nk combinations 
that lead to a value of / more extreme than that determined by the 
data. This procedure is increasingly practicable with present-day 
computers, but will still be difficult in practice if an extremely large 
number of sample points is involved. 

An approximate approach is to use a computer simulation to 
draw a large number of random samples from the distribution in 
(193): Efficient ways of doing this are given by Watterson (1978). 
If a sufficiently large number of such samples is drawn, a reliable 
empirical estimate can be made of various significance level points. 
This was done by Watterson (1978): see his Table 1. 

The simulation method allows calculation of tables of E(f\k) and 
var(/|fc) for various k and n values, which are of independent interest 
and are given (for the data of Table 1) in Table 2. 

We illustrate this test of neutrality by applying it to particular 
data. The data concern numbers and frequencies of different alleles 
at the Esterase-2 locus in various Drosophila species and are quoted 
by Ewens (1974) and Watterson (1977). 



76 



Species 


n 


k 


m 


n,2 


«3 


n 4 


n 5 


n 6 


n 7 


willistoni 


582 


7 


559 


11 


7 


2 


1 


1 


1 


tropicalis 


298 


7 


234 


52 


4 


4 


2 


1 


1 


equinoxalis 


376 


5 


361 


5 


4 


3 


3 






simulans 


308 


7 


91 


76 


70 


57 


12 


1 


1 



Table 1: Drosophila sample data 



Species 


f 


E(/) 


var(/) 


P 


1 sim 


willistoni 


0.9230 


0.4777 


0.0295 


0.007 


0.009 


tropicalis 


0.6475 


0.4434 


0.0253 


0.130 


0.134 


equinoxalis 


0.9222 


0.5654 


0.0343 


0.036 


0.044 


simulans 


0.2356 


0.4452 


0.0255 




0.044 



Table 2: Sample statistics, means, variances, and probabilities for the data of 
Table 11.1. 

For each set of data we compute /, the observed homozygosity. 
Then the exact neutral theory probability P (given in Table 2) that 
the homozygosity is more extreme than its observed value may be 
calculated (except for the D. simulans case where the computations 
are prohibitive). The simulated probabilities P S i m are also given in 
Table 2; these are in reasonable agreement with the exact values. 
The conclusion that we draw is that significant evidence of selection 
appears to exist in all species except D. tropicalis. 

We next outline two procedures based on the sample "frequency 
spectrum". Define as the (random) number of alleles in the 
sample that are represented by exactly i genes. For given k and n, 
the mean value of A { can be found directly from (152) as 

E (^M = i£p ( 195 ) 

v ' i[n — i)\ 

In this formula the Sj are values of Stirling numbers of the first 
kind as discussed after (94). The array of the E(Ai\k,n) values for 
% — 1, 2, . . . , n is the sample conditional mean frequency spectrum, 
and the corresponding array of observed values Oj is the observed 
conditional frequency spectrum. The first approach that we outline 
is an informal one, consisting of a simple visual comparison of the 
observed and the expected sample frequency spectra. Coyne (1976) 
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provides an illustration of this approach. In Coyne's data, n = 21, 
k = 10, and 

m — n 2 — ■ ■ ■ — n 9 — 1, n w = 12. 
Direct use of (71) shows that given that k — 10 and n — 21, 

E(A i \k = l ,n = 2l) = m - w ^, (196) 

and this may be evaluated for i = 1,2,..., 12, the only possible 
values in this case. A comparison of the observed a* values and 
the expected values calculated from (196) is given in Table 3. It 
appears very difficult to maintain the neutral theory in the light of 
this comparison. 
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0.0 


0.0 


1 



Table 3: Comparison of expected (E) and observed (O) sample frequency spec- 
tra. 

A second approach provides a formal test of hypothesis, but fo- 
cuses only on the number A 1 of singleton alleles in the sample. This 
procedure originally assumed selective neutrality and was used to 
test for a recent increase in the mutation rate. However, it may 
equally well be used as a test of neutrality itself if a constant muta- 
tion rate is assumed, especially for any test in which the alternative 
selective hypothesis of interest would lead to a large number of sin- 
gleton alleles. The procedure may be generalized by using as test 
statistic the total number of singleton, doubleton, . . ., j-ton alleles, 
leading to a test in which the selective alternative implies a signifi- 
cantly large number of low-frequency alleles. A parallel procedure, 
using the frequency of the most frequent allele in the data, may also 
be used. 

We describe here only the test based on the number Ai of single- 
ton alleles. The total number k of alleles in the sample is taken as 
given, and the test is based on the neutral theory conditional distri- 
bution of Ai, given k and n. (It is assumed, as is always the case in 
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practice, that n strictly exceeds k.) This conditional distribution is 
independent of 9 and is found from (152) to be 

Prob(A = a\k,n) = E^ 1 ) ^ ■ (197) 

Here Sj is again a Stirling number. The conditional mean of A 1 
is I'S'fe-il/l'S'fcl, an d the distribution (197) is approximately Poisson, 
with this mean. This observation enables a rapid approximate as- 
sessment of whether the number of singleton alleles is a significantly 
large one, assuming selective neutrality. 

Tests based on the infinitely many sites model 
Introduction 

Dr Griffiths has discussed the infinitely many sites model in detail, 
and here we use results for that model which relate to testing the 
neutrality hypothesis. Since the complete nucleotide (i.e. DNA) se- 
quences of genes are now available in large numbers, and since these 
data represent an ultimate state of knowledge of the gene, tests of 
neutrality based on infinitely many sites data are increasingly pop- 
ular. Although several tests have been proposed that use infinitely 
many sites data, here we focus on what is by far the most popular 
of these, namely the Tajima (1989) test. The theory for this test is 
based on the Watterson (1975) infinitely many sites theory, which 
assumes complete linkage (that is, no recombination) between sites. 
It is therefore assumed throughout that the data at hand conform 
to this assumption. In practice this might mean that the DNA se- 
quences in the data relate to a single gene. 

As for tests using infinitely many alleles theory, discussed above, 
it is assumed in all the calculations in this section that selective 
neutrality holds, so that these can be thought of as "null hypothesis" 
calculations. 

We assume a sample of n aligned sequences. The number S 
of sites segregating in the sample is not a sufficient statistic for 
the central parameter 9 describing the stochastic behavior of the 
evolution of these sequences. Indeed, there is no simple nontrivial 
sufficient statistic for 9 for this case. This implies that no direct 
analogue of the exact infinitely many alleles tests is possible. 
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On the other hand, in the infinitely many sites model there are 
several unbiased estimators of 9 when neutrality holds. The ba- 
sic idea of the Tajima test is to form a statistic whose numerator 
is the difference between two such unbiased estimators and whose 
denominator is an estimate of the standard deviation of this differ- 
ence. Although under neutrality these two observed values of these 
estimators should tend to be close, since they are both unbiased es- 
timators of the same quantity, under selection they should tend to 
differ, since the estimators on which they are based tend to differ 
under selection, and in predictable ways. Thus values of the statistic 
formed sufficiently far from zero lead to rejection of the neutrality 
hypothesis. To find the sampling properties of these statistics it is 
necessary first to discuss properties of the various unbiased estima- 
tors of 9 used in them. The theory described below relates to the 
Wright-Fisher model testing procedure. A parallel theory applies 
for other models. 

Estimators of 6 

In this section we consider properties of two statistics that in the 
neutral case are both unbiased estimators of the parameter 9. As 
discussed above, the theory considered in this section concerns only 
the case of completely linked segregating sites. 

The first unbiased estimator of 9 that we consider is that based 
on the number S n of segregating sites. Standard theory (discussed 
by Dr Griffiths) shows that the mean of S n is given by 



n—1 



9J2^/J=9i0, 



3=1 



where 




n-l 



(198) 



3=1 J 

We note for future reference that the variance of S„ is 



var(S' n ) = gi9 + g 2 9 2 , 



(199) 



where 




Tl-l 



(200) 
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Clearly an unbiased estimator of 9 is 



6 S = -■ (201) 

Equation (199) implies that the variance of 0s is 

var(0 5 ) = - + ^ . (202) 
9i 91 

The second unbiased estimator of 6 is found as follows. Suppose 
that the nucleotide sequences % and j in the sample are compared 
and differ at some random number T(i,j) of sites. Then T(i,j) is 
an unbiased estimator of 6. It is natural to consider all (™) possible 
comparisons of two nucleotide sequences in the sample and to form 
the statistic 

T = Tll< ^ {h3 \ (203) 

Since this is also an unbiased estimator of 9, we think of it as forming 
the unbiased estimator 9t, defined by 

»x = ^g'' 7 ' . (204) 

This estimator of 9 was proposed by Tajima (1983). It is a poor 
estimator of 9 in that its variance, namely, 

"+i., + g("'+"+3y =M+M , (205) 



3(n-l) 9ra(n-l) 

does not approach as the sample size n increases. (&i and 62 are 
implicitly defined in this equation.) However, our interest here in 
this estimator is that it forms part of a hypothesis testing procedure, 
and not as a possible estimator of 9. 



The Tajima Test 

The Tajima test in effect compares the values of §t and §s, defined 
above. Specifically, the procedure is carried out in terms of the 
statistic D, defined by 




(206) 
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where V is an unbiased estimate of the variance of 9? — 9s and is 
defined in (208) below. Tajima showed, by using adroit coalescent 
arguments, that the variance V of 9t — 9s is 

V = Cl 6 + c 2 9 2 , (207) 

where 

7 1 i, n + 2 g 2 

Cl = Ol , c 2 = b 2 h -5. 

5-1 ain g{ 

Since this variance depends on 9, any estimate of this variance de- 
pends on a choice of an estimate of 9. 

The variance of the estimator 9s decreases to as the sample size 
increases (although the decrease is very slow), so the Tajima proce- 
dure is to estimate the variance of 9t — 9s by the function of S that 
provides an unbiased estimator of the variance (207). Elementary 
statistical theory shows that this function is 

V = ^ + C ^ S ~ 1 \ (208) 
9i 91 + 92 

This is then used in the D statistic given in (206) above. 

The next task is to find the null hypothesis distribution of D. 
Although D is broadly similar in form to a z-score, it does not have 
a normal distribution and its mean is not zero, nor is its variance 
1, since the denominator of D involves a variance estimate rather 
than a known variance. Further, the distribution of D depends on 
the value of 9, which is in practice unknown. Thus there is no null 
hypothesis distribution of D invariant over all 9 values. 

The Tajima procedure approximates the null hypothesis distri- 
bution of D in the following way. First, the smallest value that D 
can take arises when there is a singleton nucleotide at each site seg- 
regating. In this case 9t is 2S/n, and the numerator in D is then 
{(2/n) — (l/gi)}S. In this case the value of D approaches a, defined 

a = « 2 /"> - fl/gOh/g+I, (209) 

\/C2 

as the value of S approaches infinity. 

The largest value that D can take arises when there are n/2 
nucleotides of one type and n/2 nucleotides of another type at each 
site (for n even) or when there are (n — l)/2 nucleotides of one type 
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and (n + l)/2 nucleotides of another type at each site (for n odd). 
In this case the value of D approaches b, defined by 

b = {(n/2(n-l))-(l/a 1 )} v ^T^ 

\fc~2 



when n is even and the value of S approaches infinity. A similar 
formula applies when n is odd. 

Second, it is assumed, as an approximation, that the mean of D 
is and the variance of D is 1. Finally, it is also assumed that the 
density function of D is the generalized beta distribution over the 
range (a, b), defined by 

{m ,_ na + P){b-DT-\D-aY- 1 r911 . 
A ] ~ r(a)r(/3)(6-a)^- 1 ' 1 ] 

with the parameters a and (5 chosen so that the mean of D is indeed 
and the variance of D is indeed 1. This leads to the choice 

(l + ab)b n (l + ab)a 

a = 1 ' • 

b — a b — a 

This approximate null hypothesis distribution is then used to assess 
whether any observed value of D is significant. 

The various approximations listed above have been examined in 
detail in the literature. It appears that the Tajima procedure is 
often fairly accurate, although examples can be found where this is 
not so. We do not pursue these matters here. 
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